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FOREWORD 


As per the 2001 Census, Austro-Asiatic language family, 
w *th a total population of 11,442,029, comprising 1.11% of 
India’s population, is a store house of over 11% of the nation’s 
linguistic diversity, for it has 14 of 122 languages with 21 of 
the 234 mother tongues (the total number of mother tongues 
for all families listed in the Census spoken by over 10,000). 
However, it does have several other mother tongues which are 
returned but not listed as per Census policy as they are below 
10,000, which are of no less importance to us linguists. The 
spread of languages reveals that their natural habitat is 
predominantly central India with Khasi as the lone member in 
Northeast, but with linguists making out a case for the 
Kicobarese as a member of the family, it is also spread across 
the seas and has disjuncted distribution. 

Santali is the sole representative of the Austro-Asiatic 
family in the list of 22 languages included in the eighth 
schedule. All the languages are tribal languages for they 
belong to the Adivasis who have links that are from times 
immemorial. These languages are primarily ‘spoken’ with 
Santali alone having literary records of substance, which too 
are often less than 100 years old and in different scripts too. 
Yet theie is no denying that the speakers of these languages 
have been a most integrative lot for they have allowed their 
expression systems to be shaped by their ecology where other 
language families- Indo-Aryan and Dravidian in particular in 
central India- have converged with them and contributed most 
to the formation of India as a linguistic area with grassroots 
multilingualism. 

Documentation, Description and Development go 
together in language work and for nurturing diversity these are 
the key components of language planning. Each variety is a 




unique record of its ancestry, cultural moorings and social 
history, besides being a unique window in to the functioning of 
the human mind. Austro-Asiatic languages are under described, 
partially documented and least developed languages which 
must receive all attention. The conference proceedings of the 
meet held a few years back at Deccan College, Pune should 
prove invaluable to those looking to strengthen the study of 
these languages. I am sure my friend and colleague Elangaiyan 
would have liked to bring out these proceedings under his own 
watchful eye, but fate has ordained it otherwise. I fondly 
recollect all the warm hearted discussions we have had in the 
past and I am particularly grateful that Nagaraja and Kashyap 
have put the manuscript together as a memorial volume. I also 
hope the work of Elangaiyan -a language missionary with 
compassion-had begun with the Nicobarese language will also 
see the light of day some time in the future and our Institute 
would remain ever supportive in that endeavour. 

I also hope we will strive on to do more for these 
languages. Santali is being linked to the National Translation 
Mission to produce modernized discourses in various 
disciplines and it is also linked to the language technology unit. 
Khasi too is taking rapid strides in Meghalaya where it is an 
associate official language. The Sarva Shiksha Abhyan is 
creating new opportunities for languages like Mundari or 
Kharia and the Nicobarese primers prepared by our Institute 
have already become useful in schools. The work that awaits 
us is of course a lot more and we should strive on...and on. 



Rajesh Sachdeva 

Director Incharge 
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In Ivlemorium 

RATHINASABAPATHY ELANGARYAN 


RATHINASABAPATHY ELANGAIYAN (R.Elangaiyan) 
born to Smt. T. Meenaambal and Shri Ratinasabhapathy on 
12"' April 1949; at Adupukattai in Madurai district of 
Tamilnadu State; after completing graduation at VirudhuNagar 
(TN), moved to Thiruvananthapuram in Kerala State for his 
master’s degree. He took his master’s degree in Linguistics 


from the University of Kerala (India) in 1973. He took up a 
research project on Dhangar Kurux/Kurukh language (spoken 
in Nepal Tarai region) in Deccan College, Pune in India with 
an aim of submitting a Doctoral thesis but did not complete the 
work for personal reasons. In 1981, he joined the Central 
Institute of Indian languages (CIIL), Mysore as Research 
Assistant in the Tribal and Border Languages Unit which was 
renamed in the year 2001 as Research group for Tribal and 
Endangered Languages. He did a survey of Kurux dialects 
spoken in Central India and its diaspora in the non-contiguous 
areas. The findings of this survey were used in preparing 

xvuiuX piiifiCib tu uc uScu iui SCfiuui mciu-uy. nc gulucu me 

Car Mcobarese mother tongue teachers and coauthored with 
them to produce school primers in the Car Nicobarese 
language and they are used in schools. He had conducted 
several linguistic training programmes for teaching linguistics 
to Language Officers and University teachers in various states 
in India. He was working on the grammars of Car Nicobarese 


language and Idu Mishmi, an endangered language spoken 
near the Indo-Chinese border. He studied the phonology of Idu 
Mishmi and adopted the roman script for that language. He 
was interested in studies on language endangerment, 
ethnolinguistics, translations (he knew six Indian languages - 
Tamil, Malayaiam, Kurux, Car Nicobarese, Kannada and 
Hindi) and language planning with special accent on ‘term’ 
(terminology) planning. He coordinated and conducted a Post 
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Conference Seminar in the ICOSAL-3 Conference on 
‘language Endangerment and South Asia’ at Hyderabad in 
January 2001 and later in January 2005 he conducted a 
Symposium on Globalization and language Endangerment in 
the XXXIII Indian Social Science Conference in Gandhigram. 
In the beginning of 2006 he conducted a workshop in Ranchi 
(Central India) on term planning in Kurux for facilitating 
writing Kurux grammar in Kurux by the Kurux native scholars. 
In the same year he was instrumental in organization of 
Foundation for Endangered Language^ X International 
Conference on Vital Voices: Endangered language and 
Multilingualism at CIIL, Mysore between 25-27 October 2006. 
He brought out a special volume on that occasion. He was the 
joint cooidinator of 3rd International conference of 
Austroasiatic Linguistics held at Deccan College, Pune 
between 26-28 November 2007. In this conference Shri 
Elangaiyan’s initiative saw a record participation of the Munda 
scholars. 

He was an ardent supporter of pluralism of all sorts - 
religion, language, culture, politics etc. As an activist for 
pluialism he had been working for building awareness among 
different communities (within his reach) for this purpose. 

His sudden and unexpected demise on 18 th January 2008 
at Mysore has left the field of Tribal and endangered 
languages in a state of rudderless ship. 

He is survived by his wife, Lalitha and daughter Thalir 
Meenal. 

May his soul remain in peace. 


IX 



EDITORIAL NOTE 



It gives us great pleasure to place before scholars this 
volume containing the proceedings of the 3 rd International 
Conference of Austroasiatic Linguistics (ICAAL3). The 
conference was organized by the Department of Linguistics, 
Deccan College Post-graduate & Research Institute, Pune in 
collaboration with Central Institute of Indian Languages, 
Mysore and Linguistic Society of India. 

With the efforts of Wilhelm Schmidt in the early 20 th 
century, the language groups now identified as ‘Austroasiatic’ 
and ‘Austronesian’ came to be recognized. Subsequently, with 
further work on these languages by various scholars the 
establishment of these two groups as separate families became 
acceptable. The present conference was on Austroasiatic 
languages only. 

The first ICAAL was held at Hawaii University, Honolulu 
in 1973. The proceedings of that conference, was published in 
1976, sumptuously in two volumes. It reflected great interest 
the scholars had in this area. After five years, the second 
ICAAL was held at CIIL, Mysore, in December 1978. Though 
the proceedings were to be published, it could not materialize 
due to various reasons. After that there was a long lull in the 
area. 


The present (third) Conference was aimed at reviving the 
interest of scholars and students in this field. The conference 
hoped to bring together all the scholars working on these 
languages and deliberate about the continuity and the 
directions to be followed in this field. 

The idea of this conference was mooted during the pre¬ 
conference picnic held at Siam Riep, Cambodia in June 2006 
organized by Gerald Diffloth with the view of reviving interest 
in these languages. In the deliberations during that picnic, it 
was suggested by the organizers that Deccan College at Pune, 



India should host the forthcoming conference. That idea was 
endorsed by other participants as well. After return from there, 
we met our colleagues in the department and spoke to them 
about the outcome of the picnic. They and the other staff 
whole-heartedly supported the idea and encouraged us to 
discuss this matter with the Director. With that mandate we 
met the Director, Professor K.Paddayya and expressed the 
willingness to hold the conference. The director accepted the 
idea, and assured of all the assistance from the Institute’s 
authorities, sance funds. 

With Director’s approval, we contacted the Director of 
Central Institute of Indian Languages, for funds. The Director 
Prof. Udaya Narayana Singh readily accepted the idea and not 
only promised financial support but also agreed to co-host the 
conference. Also he appointed Shri. R.Elangaiyan, the only 
Austioasiatic specialist at CIIL, as joint coordinator. 

The conference notifications were released in the month of 
June itself and the response was over-whelming. Totally ten 
foreign scholars and more than twenty Indian scholars 
participated in the conference. The Conference was held at the 
Institute for three days on 26, 27 and 28 November 2007. The 
conference was inaugurated by Prof. Ram Dayal Munda, and 
Prof. Gerald Diffloth delivered the keynote address, in which 
he assessed the state of homeland of Austroasiatic people. Dr. 

D.B.Deglurkai, the President of Deccan College presided over 
the function. 

We are grateful to all the scholars for participating in the 
conference coming from different countries like Japan, 
England, The Netherland, Thailand, and from different parts of 
India and making it a grand success. Totally 32 papers were 

presented. In the current volume some selected papers have 
been included. 

We feel obliged to Prof. Ram Dayal Munda for accepting 
oui invitation to inaugurate the Conference, and to Prof. 



Gerald Diffloth for delivering the key-note address, and to Prof. 
K.Paddayya, the Institute’s Director, for constant guidance and 
supervision in planning meticulously and for attending all the 
sessions. 

The present volume contains totally sixteen papers 
arranged in four sections. The first section contains the 
transcripts of two speeches*, delivered by Profs. Gerard 
Diffloth and Ram Dayal Munda. The second section contains 
three papers on Mon-Khmer languages, by Paul Sidwell, Doug 
Cooper and Sophana Srichampa. The third section contains 
three papers on the languages spoken in Nicobar Islands; while 
the fourth and last section contains 7 papers and preceded by 
an introductory paper dealing with Austroasiatic languages. 

We would like to take this opportunity to express our sense 
of gratitude to Central Institute of Indian Languages, Mysore; 
but for them this conference would not have been held. 
Shri. R.Elangaiyan of CIIL, Mysore was a great source of 
inspiration and support, whose involvement was responsible 
for the success of the conference. 

I thank all the participants of the conference and the 
contributors for this volume for their whole-hearted support. 
By their participation they have shown that the subject is very 
much relevant and worth-pursuing. 

We are highly obliged to our Institute’s Dr.D.B. Deglurkar 
(President), Prof. K.Paddayya (Director), Shri S.R.Kashikar 
(Registrar), Shri N.S.Gaware (Deputy Registrar), Shri 
Khedekar (Estate Manager), Shri B.S.Gajul (Store keeper-in¬ 
charge), Smt. Trupti More (Librarian) and Deccan College 
Office of P.W.D. for their help and cooperation in making 
various infrastructural arrangements for the conference. We 
thank the chairpersons and rapporteurs of various academic 
sessions for conducting the deliberations of the conference in 
an efficient manner. The faculty, technical and administrative 



staff, and students of the Institute extended full support. We 
are grateful to them. 

This volume has become possible due to the support of 
Central Institute of Indian Languages, Mysore, who took the 
responsibility of printing this volume. We also thank the 
present diiector, Prof. Rajesh Sachdeva for providing a 
foreword for this volume, and Dr. Srinivasacharya for all the 
follow-up action. 

Last but not the least, our appreciation goes to all the staff 
of the Press of OIL, who were responsible for printing this 
volume. We fuither thank them for their keen interest in 
printing this volume. 


The Editors 


*Diffloth s transcript is basically the abstract he had submitted for the 
confeience, while foi Ram Dayal Munda’s was the actual transcription of the 
recordings. It was transcribed by Ms. Shubhangi Kardile of the Linguistic 
department, Deccan College. 
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AUSTROASIAT1C LANGUAGES - AN 
INTRODUCTION 1 

K.S.Nagaraja 

Deccan College Post-graduate & Research Institute, Pune 

Austroasiatic language family is one of the five important 
families found in the Indian sub-continent. The others are 
Indo-Aryan, Dravidian, Tibeto-Burman and Andamanese. The 
term Austroasiatic is composed of Latin ‘austro’ meaning 
‘south’ and ‘-asiatic’ ‘Asia’. The speakers of this family are 
scattered across south and South-east Asia, starting from 
central and eastern parts of India spreading to Bangladesh, 
Burma, southern China, Thailand, Laos, Cambodia, South and 
North Vietnam and Malaysia. The languages of this family are 
generally grouped into two sub-branches, namely, Munda, and 
Mon-Khmer. Nicobarese, which is included within Mon- 
Khmer, used to be treated as a separate sub-branch. While the 
Munda sub-branch is wholly located in the Indian- 
subcontinent, Mon-Khmer branch is found in most of South¬ 
east Asia starting with eastern India. 

The date and place of the origin of Austroasiatic is still 
unknown. The place was probably southern or southeastern 
China; the date was at least circa 2-2500 B.C. and possibly 
much earlier. From that location, AA speakers moved south 
into the Indo-China peninsula and west into India. Their 
southern-most expansion was the Nicobar Islands and the 
southern tip of Malaysia; some may have penetrated Sumatra 
and other islands, but this is uncertain. Their western-most 
expansion is unknown, but some argue that it extended into 
modern Pakistan. 

At one time, the AA domain probably extended unbroken 
from China to India to Malaysia, but invasion by speakers of 
other languages split the AA community and divided it into 
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K.S.Nagaraja 


enclaves. The Dravidians and Indo-Aryans entered India and 
certainly overran much of the Munda territory, doubtlessly 
reducing the AA population and certainly leaving the Munda 
languages scattered in small groups. The Tibeto-Burmans and 
Tais invaded Indo-China and split the AA domain in two. The 
Chams and Malays invaded Vietnam and Malaysia, 
respectively. The Chinese occupied northern Vietnam for 
nearly a millennium, and an Indian aristocracy apparently 
ruled parts of modern Cambodia and Thailand for some time. 
As a result of such incursions, few nation states ever developed 
in the AA domain, the Khmer, Mon, and Vietnamese empires 
being the exceptions, and most of the AA speakers have lived 
and still live in small tribal groups. 

All of these events had an impact on the AA languages, 
and foi that reason, these languages exhibit an uncommon 
diversity that makes tracing their diachronic development 
extremely difficult. The Munda family has been influenced by 
a synthesizing, non-tonal Indian Sprachbund, the Mon-Khmer 
by an isolating, tonal Sinitic Sprachbund. In short, the two AA 
subfamilies have been pulled and consequently have evolved 
in different directions, which makes reconstructing their 
original phonology and grammar a real challenge for historical 
linguists. Due to such difficulties and the inaccessibility of 
many of the tribal groups, A A studies have not proceeded 
apace with those of neighbouring language families. 
Comparatively speaking, little historical work has been done, 
and the proto-language of the family, the subfamilies, and most 
of the branches remain to be reconstructed. 

The speakers of Munda-subbranch are considered to be 
one of the oldest groups living in this subcontinent. Though 
there are no literary texts in any of these languages as no 
language of the group was put to writing at any time of the 
history; oblique references to these languages can be found in 
the literary texts of Indo-Aryan languages. In ancient literature 
some of these people were referred to as Nishada, Kolia, Billa, 
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Pulinda ~ Pulindra, Kirata, Sahara, etc. For instance, the 
Kirata people of the Vedic texts must be the present Kharia 
people; Sahara people mentioned in Aitareya Brahmana and 
other early Sanskrit texts must be the present Soara people. 
The Kharia and Juang consider themselves to be sections of 
the Sahara people of olden times. The Juang women used 
leaves as their garments wear until recently, for which reason 
they are also known as Patua. In this country most of these 
people are found in midst of forests and mountains of central 
and eastern India and in Nicobar group of Islands. Only during 
the last two centuries a better understanding of these people 
has become possible. 

In the early 19 ,h century a beginning was made with the 
attempts of Hodgson and Max Muller. Hodgson (1848, 1856) 
hinted that there were actually three groups of languages in 
India, the Himalayan, Indo-Aryan and Tamulian. The last one 
included both the Dravidian and the Kol groups of languages. 
In 1854 Max Muller separated the Dravidian from Kol/Austric 
and referred to these people as ‘ Munda ’ for the first time. In 
due course of time this term was accepted to refer to these 
people as a group as such. 

Robert Cust was perhaps the first scholar who tried to give 
a systematic linguistic account of the different families of 
Indian languages in his ‘ A Sketch of the Modern languages of 
the East Indies' (1878). He discussed the Aryan, Dravidian, 
Kolarian, Tibeto-Burman, Khasi, Tai, Mon-Annan and the 
Malayan families of languages in different chapters of his 
book. All of his linguistic statements are not tenable now. But 
some of the observations are interesting. He writes “Like the 
Dravidian, it (i.e. Munda) is morphologically agglutinative, but 
with distinct characteristics. Like the Tibeto-Burman, it 
probably found its way to its present habitat from the plateau 
of Tibet, but it has so long been cut off from all connection 
with that family by the stormwave of the Aryan immigration 
down the valley off the Ganges, that nothing but faint 
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analogies survive. It must decidedly be treated as an 
independent family, occupying ground in the provinces of 
Bengal and Madras and the central provinces, chiefly in the 
hills, and intermixed with the more energetic families, the 
Aryan and Dravidian.... The Kolarian family has a higher 
degree of inflection, and more complete indigenous 
vocabularies, than the Dravidian. In its genders it makes a 
distinction betwixt animate and inanimate objects. It has no 
oblique forms for its nouns. It has a dual number, while the 
Dravidian family has not. It has no negative voice. It has two 
forms for each tense, which in most of the languages gives the 
verb a transitive and intransitive meaning. It varies the 
meaning of a root by infixing syllables, but never changes, like: 
the Dravidian, any of the letters of the root itself’ (pp 79-80) 
(SB) 

With the effort of Wilhelm Schmidt in the early 20“' 
century, the language groups now identified as ‘Austroasiatic’ 
and ‘Austronesian’ came to be recognized. It was in 1906 for 
the first time Wilhelm Schmidt, the real discoverer of the 
Austroasiatic family, propounded the ‘Austric theory’. It was 
aimed at providing a racial, cultural and linguistic common 
origin for all the groups of people speaking Austroasiatic and 
Austronesian languages. By the term ‘Austroasiatic’ he meant 
the old loose Mon-Khmer group consisting of the Kolarian 
(Munda), Khasi, Mon and Khmer languages (and not 
Annamese), to which he added the Pronominalized Himalayan 
languages, Nicobarese, Palaung, Wa, Riang, Semang, Sakai, 
Malacca and Cham languages belonged to a family called 
'Austric' ('austroasiatisher sprachstamm') (Pinnow 1963). 

Though at that time it raised serious doubts about the 
validity of this theory, later scholars found that the theory was 
very much accurate except for a couple of points: one was to 
consider Pronominalized Himalayan languages as belonging to 
this group, (later removed); second, was that he considered 
Cham languages as Austric, which were later classified as 
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belonging to Austronesian (Malayo-Polynesian). Third, he 
considered Vietnamese to be non-austroasiatic. In spite of 
those limitations, Schmidt's articles for the first time supported 
the Austroasiatic hypothesis with lexical, phonological and 
morphological evidence. They remain until today the basic 
works of Austroasiatic studies (G.Diffloth: 1980). At present 
though there are many scholars working on individual 
languages or one or two sub-branches of the family, Gerard 
Diffloth is probably the only scholar working extensively on 
these languages from a comparative point of view. 

Schmidt grouped the Munda languages into three 
subgroups on the basis of the distribution of k and /? (from 
proto-Munda *q) into the following: 

1. an eastern group (Kherwari, with h); 

2. a western group (Kurku, Kharia, Juang with k); and 

3. a mixed group (south-eastern Mundari with a loss of Proto 
Munda *q). 

As this classification was based on a single argument it 
could not do justice to the facts. 

Grierson has discussed Munda and Mon-Khmer languages in 
three volumes of Linguistic Survey of India , namely Vol. I 
(Introduction: 1927), Vol. II (1903), and Vol. IV (1906). 

In the Vol. II published in 1903, which deals with Mon- 
Khmer languages, Grierson discusses Khasi, but not Munda. In 
the introduction to that volume he mentioned that “Linguistic 
evidence points to the conclusion that some form of Mon- 
Khmer speech was once the language of the whole of Further 
India”. By the term ‘Mon-Khmer’ he of course meant ‘Mon- 
Annan’. He has added a footnote: “It is not intended to suggest 
that its speakers were the autochthons of this region. They 
probably migrated from North-western China and dispossessed 
the aborigines, as they in turn, were dispossessed by the 
Tibeto-Burmans and the Tais”. Interestingly Grierson does not 
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include Munda in this volume. He stressed the linguistic 
differences lying between Munda and the Mon-Khmer 
languages, and said in conclusion: “owing to the existence of 
these differences (i.e. differences in matter of affixation, word 
order, etc.) We should not be justified in assuming a common 
origin for the Mon-Khmer languages on the onehand, and for 
the Munda, Nancowry and the Malacca languages on the: 
other”. But he agreed with the scholars preceding him, like 
Logan, Forbes, and others, that a common substratum lying at 
the bottom of all these Indian and South-East Asiatic languages 
might be responsible for the linguistic agreements. 

The fourth Volume of LSI was edited by Sten Konow, and 
published in 1906. Konow included Munda and Dravidian 
languages in this volume, probably because he felt that they 
belonged to same racial group. The LSI gave the status of 
family to these languages and also gave the name of ‘Munda’ 
as suggested by Max Muller. This Volume contains some 
interesting points: ... ‘one-fifth of the total population of India 
speaks languages belonging to the Munda and Dravidian 
families. These forms of speech have been called by 
anthropologists ‘the languages of the Dravida race' (p 1). 
“According to the eminent German philologist and ethnologist 
Friedrich Muller ‘ Dravidian race ’ included Munda dialect, 
Singhalese and the Dravidian languages proper’, (p 2), ‘(p 5) 
on the other hand, the Mundas and the Dravidas belong to the 
same ethnic stock’. ...’ 'For our present purposes it is 
sufficient to state that the languages of the Mundas and the 
Dravidas are not connected but form two quite independent 
families’. This fourth volume has identified seven Munda 
languages: Korku, Mundari, Santali, Kharia, Juang, Savara 
and Gadaba. Under Korku, Nahali was included. 

The Introductory volume of LSI, which was published in 
1927, shows some change of opinion by Grierson. As by that 
time Grierson seems to have read Wilhelm Schmidt’s ‘Austric 
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theory’. So, Grierson has given full support to Schmidt’s 
‘Austric theory’. 

Later, H.J.Pinnow (in 1959 through his extensive studies 
on Munda languages) showed that the verbal inflection of all 
Munda languages is traceable to a Proto-Munda inflectional 
system which was later expanded in the north and considerably 
reduced in the south. From this evidence and on the basis of 
lexical differences the Munda languages were classified into a 
northern group with the subgroups Kurku and Kherwari 
(Santali, Mundari, Korwa, etc., belong to the latter group), 
and a southern group which is further subdivided into a 
central group (including Kharia and Juang), and a south¬ 
eastern group (including Sora, Pareng, Gutob and Remo). 
The relation of Kherwari and Kurku is much closer than that 
of central and south-eastern Munda, which must have been 
separated much earlier than Kherwari and Kurku. This 
classification of Pinnow was adopted with slight modifications 
by the later scholars like Norman Zide and David Stampe. It 
recognized the following languages — Gatah, Gutob, Ho, 
Juang, Kharia, Korku, Korwa, Mundari, Remo, Santali and 
Sora. 

AUSTROASIATIC LANGUAGE FAMILY 


Munda 

Mon-Khmer 

Santali 

Khasian> Khasi 

Mundari 

Palaungio Palaung, Wa 

Ho 

Monio Mon 

Korwa 

Khmuic 

Korku 

Viet-Muong>Vietnamese 

Kharia 

Katuio Katu 

Juang 

Bahnario Bahnar 
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Sora 

Gorum/Parengi 

Gutob 

Remo/Bonda 

Gtah/Diday(i) 


Peario Pear 
Khmer 

North Aslian> Kensiw 
Senioc>Temiar, Semai 
South Aslia,n> Semelai 
Nicobarese 


(more than 100 languages) 

The complete stammbaum of Munda sub-group is as follows: 

PM 



NM 


Koraput M Central M 


GRG 

| 

S-G 

1 

K-J 

Gutob 

1 

Sora 

| 1 

Remo 

Gorum 

Kharia Juang 

Gtah 




Kherwarian M Korku 


Santali Mund-Ho 


Santali Mundari 
(Birhor) 

(Asuri ), | j 

etc. Ho Korwa 


In 1975 Bhattacharya provided a fresh classification based 
mainly on his own field work on different Munda languages 
and also based on other available sources. He identified ten 
independent Munda languages and six important dialects. 
These languages are grouped into two branches and sub¬ 
grouped further. This classification is graphicalW represented 
in the following stammbaum: 
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Munda 


Lower. Munda 


Upper Munda 


Gut ob-Bond a 


Parengi- Sora Juang- Kharia Kherwari Korku 


Gutob Bonda Didey Parengi Sora Juang Kharia 
(Remo) (Gtah) (Gorum) 


Santali 
(Birhor) 
( Asuri) 
Mundari 
(Ho, Korwa) 


Korku 
( Muwasi) 
> (E.Korawa) 
(Koraku) 


Bhattacharya showed that there are many major differences 
between Upper Munda languages and Lower Munda 
languages. Firstly, the absence of the pronominal object 
markers in verb stems in LM as versus its presence in UM. 
This is the main distinguishing feature. Secondly, the 
languages of LM have -n for genitive marker, while all the UM 
languages have -a/-a'. Thirdly, in combination with kinship 
terms of words for parts of the body the pronominal elements 
of the first and second persons are usually prefixed in LM; but 
suffixed in UM. Also, in the formation of non-singular 
numbers LM differs from UM. 

Arlene Zide during the Munda project of 1960s identified a 
language called Juray and considered that it should be a 
closely related to Sora language. It supposed to have 
approximately 6-7000 speakers and is spoken in ca. 25-30 
villages in Gunjam district of Orissa and neighbouring districts 
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in both AP and Orissa. Till 1972 it was considered a dialect of 
Sora. After that nothing much is known about it. 

In 2001 Anderson proposed a so-called new classification of 
Munda languages, which is somewhat different from the 
earlier ones. It is provided here for comparison. 


Proto-Munda 



North Munda South Munda 



Still there are problems of identification of certain 
language names and groupings. For instance, in the S.B's 
classification above (i) Birhor and Asuri are considered as 
dialects of Santali; (ii) Ho is considered as a dialect of 
Mundari, and (iii) Muwasi, E.Korawa and Koraku are 
considered as the dialects of Korku. But according to Pinnow- 
Stampe, except for Birhor and Asuri, Ho, Koraku anc 
Kor(a)wa are separate languages. Also the terminologies are 
different, so this creates confusion. Therefore standardization 
of terminologies is also very urgently required. In this work 
the terminologies provided by Pinnow-Stampe are used. 
However, alternative terminologies are provided in the 
headings themselves wherever available. Also, there is a 
mismatch between what linguists identify as independent 
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Munda languages and what Census of India considers as 
independent languages. So, even in the 1991/2001 Census one 
can come across an entry called ‘Munda’ which is not an 
individual language according to the linguists. Also there is a 
mismatch between what the People of India report says about 
some of them and the linguists. They state that the Korwas and 
Parengis speak Indo-Aryan languages, where as linguists 
consider that they speak Austroasiatic languages. 

As far as Nahali is considered - Pinnow considered it to 
possibly belong to the western group of Austroasiatic along 
with Munda. Stampe-Zide considered it to belong to 
Austroasiatic if not to Munda branch itself. On the other hand, 
Bhattacharya, the only scholar to have collected some data on 
that language, (besides LSI itself) does not consider it to 
belong to this family at all. Recent work on this has 
ascertained that it does not belong to this family at all. 

COMPARATIVE GROWTH OF NON-SCHEDULED 
AUSTROASIATIC LANGUAGES OF INDIA 1971, 1981, 1991 
AND 2001 (From Census of India 2001: Paper 1 of 2007: 
Language. Office of the Registrar General, India, New Delhi. 



1971 

1981 

1991 

2001 

Bhumij:# 

51,651 

50,384 

45,302 

47,443 

Gadaba*: 

20,420 

28,027 

28,158 

26,262 

Ho:# 

751,389 

783,301 

949,216 

1,042,724 

Juang: 

12,172 

19,038 

16,858 

23,858 

Kharia: 

191,421 

212,605 

225,556 

239,608 

Koda/Kora 

14,333 

23,113 

28,200 

43,030 

Korku: 

307,434 

347,661 

466,073 

574,481 

Korwa:# 


18,079 

27,485 

34,586 

Mundari: 

771,253 

742,739 

861,378 

1,061,352 

Munda:# 

309,293 

377,492 

413,894 

469,357 

Santali: 

3,786,899 

4,332,511 

5,216,325 

6,469,600 

Savara:* 

222,018 

209,092 

273,168 

252,519 

Niuobarese: 

17,971 

21,542 

26,261 

28,784 

Khasi: 

479,028 

628,846 

912,283 

1,128,575 
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Assam border; Ho, with about one million speakers mainly in 
Jharkhand and Orissa; Mundari, with about one million 
speakers mainly in Jharkhand; and Korku, the westernmost 
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Munda language, spoken by nearly six lakhs in southern 
Madhya Pradesh and northern Maharashtra. Munda languages 
differ from all other Austroasiatic languages in having 
complex morphology and in having basic subject-object-verb 
rather than subject-verb-object word order. 

As Mon-Khmer part has been covered extensively by Paul 
Sidwell, that part has not been touched upon here. So, below, a 
review of work on Munda languages is considered followed by 
a bibliography on the languages concerned taking into account 
the developments of last two-three decades into consideration. 
Descriptive works: Though we know the existence of around 
12 Munda languages. We do not have good modern 
descriptions of all of them. For instance, languages like 
San tali, Mundari, Korku, Kharia, Bhumij have good 
grammatical descriptions, but others do not have. Similarly 
modern bilingual/multilingual dictionaries are still remote for 
most of them. Santali boasts of Bodding’s 5 vol. dictionary, 
Campbell’s 3 vol. dictionary, and Hoffman & Emelen’s 
Encyclopaedia Mundarica in 13 volumes. Others are not that 
fortunate. Similarly, collection of folk literature is also lagging 
for most of the languages. Again, Santali and Mundari score 
over others, with Bodding’s three volume work on Santali is a 
very good collection. Others are far behind. 

As most of the languages are unwritten till recently, in the 
present days, some of them are written in more than one 
adopted script, depending upon where they are spoken, (e.g. 
Santali in Hindi, Oriya, Bengali, etc.). Even though some 
languages have come up with ‘own’ scripts, (Santali, Ho) not 
much progress has been made in using them, in preparing 
school books, etc. In some places some of these languages 
have been introduced in schools. But most of the other 
languages, particularly the Munda languages of Orissa, which 
are highly endangered) do not have this privilege. 
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Dialect studies are also lacking. There is an urgent need to 
study all the dialects of the languages so that extent of 
variation can be understood. This step will pave way for 
reconstruction of earlier stages of respective languages. This 
step is very important for reconstruction of proto-stage of the 
group concerned. 

Historical and Comparative work shows some progress 
with Anderson’s and others works. But here also, without good 
descriptive works, comparative works and subsequent 
reconstructions of their earlier stages will not be very 
convincing. 

Regarding the homeland of Munda people, more research 
need to be undertaken on a multi-disciplinary axis as no single 
disciplinary approach will bear any fruit. 

Khasi of Mon-Khmer branch is much better in its usages, 
education, literature, massmedia. Good grammars, dictionaries 
are available; but surprisingly post 1985 Nagaraja’s grammar, 
no grammar has come to light till now. Dialect survey of Khasi 
has been undertaken as a major project by Central Institute of 
Indian Languages, Mysore, for the last couple of years. The 
preliminary result is to be out soon. 

There is a great need for comparative studies taking Khasi 
on the one hand and other Mon-Khmer languages on the other. 

Appendix 

Below, various attempted classifications of the Austroasiatic 
languages as proposed by different scholars and available in 
the NET have been provided. 

Classification 

Linguists traditionally recognize two primary divisions of 
Austro-Asiatic: the Mon-Khmer languages of Southeast Asia., 
Northeast India and the Nicobar Islands, and the Munda 
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languages of East and Central India and parts of Bangladesh. 
Ethnologue identifies 168 Austro-Asiatic languages, of which 
147 are Mon-Khmer and 21 are Munda. However, no evidence 
for this classification has ever been published. 

Each of the families that is written in boldface type below 
is accepted as a valid clade. However, the relationships 
between these families within Austro-Asiatic is debated; in 
addition to the traditional classification, two recent proposals 
are given, neither of which accept traditional Mon-Khmer as a 
valid unit. It should be noted that little of the data used for 
competing classifications has ever been published, and 
therefore cannot be evaluated by peer review. 

Gerard Diffloth (1974) 

Munda: 

This is the widely cited classification used in the Encyclopedia 
Britannica . This bipartite classification given by Diffloth 
(1974) is widely cited. Several languages that were not known 
of at the time are missing. Also many of the names mentioned 
at that time are no longer used. 

• North Munda 

o Korku 
o 

o Kherwarian 

■ Kherwari branch: A.gariya , Bijori , Koraku 

Mundari branch: Mundari , Bhumij , Asuri , Koda , Ho , Birhor 

■ Santali branch: Santali . Mahali , Turi 


• South Munda 

Kharia-Juang: Kharia , Juang 

o Koraput Munda 

Remo branch: Gata (Gta), Bondo (Remo), Bodo Gadaba (Gutob) 
Savara branch [Sora-Juray-Gorum] : Parengi (Gorum) [in 
Koraput District] , Sora (Savara), Juray , Lodhi 
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Mon-Khmer : 

o Eastern Mon-Khmer 

■ Khmer (Cambodian) 

■ Pearic 

■ Bahnaric 

■ Katuic 

■ Vietic (includes Vietnamese) 
o Northern Mon-Khmer 

■ Khasi (Meghalaya , India ) 

■ Palaungic 

■ Khmuic 

o Southern Mon-Khmer 

■ Mon 

■ Aslian ( Malaya) 

■ Nicobarese ( Nicobar Islands) 

Ilia Peiros (2004) 

Peiros is a lexicostatistic classification, based on percentages 
of shared vocabulary. This means that a language may appear 
to be more distantly related than it actually is due to languag e 
contact , so it is only a starting point tor a proper genealogical 
classification. 

• Nicobarese 

• Munda-Khmer 
o Munda 

o Mon-Khmer 


Khasi 
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■ Nuclear Mon-Khmer 

■ Mangic (Mang + Palyu) (perhaps in Northern MK) 

■ Vietic (perhaps in Northern MK) 

■ Northern Mon-Khmer 

■ Palaungic 

■ Khmuic ' 

■ Central Mon-Khmer 

■ Khmer dialects 

■ Pearic 

■ Asli-Bahnaric 

■ Aslian 

■ Mon-Bahnaric 

■ Monic 
Katu-Bahnaric 

■ Katuic 

■ Bahnaric 

Gerard Diffloth (2005) provides a somewhat changed picture. 
Here rather than counting cognates, Diffloth compares 
reconstructions of various clades, and attempts to classify 
them based on shared innovations. 

Munda languages ( India ) 

Koraput: 7 languages 

Core Munda languages 

Kharian-Juang: 2 languages 

North Munda languages 

Korku 


Kherwarian: 12 languages 
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Khasi-Khmuic languages 

Khasian : 3 languages of eastern India and Bangladesh. 

Palaungo-Khmuic languages 
Khmuic : 13 languages of Laos and Thailand. 

Palaungo-Pakanic languages 

Pakanic or Palyu : 4 or 5 languages of southern China and 
Vietnam 

Palaungic : 21 languages of Myanmar, southern China, and 
Thailand. 

Nuclear Mon-Khmer languages 
Khmero-Vietic languages 
Vieto-Katuic languages 

Vietic-. 10 languages of Vietnam and Laos, including the Vietname se 
language , which has the most speakers of any Austro-Asiatic 
language. These are the only Austro-Asiatic languages to have highly 
developed tone systems. 

Katuic : 19 languages of Laos, Vietnam, and Thailand. 
Khmero-Bahnaric languages 

Bahnaric : 40 languages of Vietnam, Laos, and Cambodia. 

Khmeric languages 

The Khmer dialects of Cambodia, Thailand, and Vietnam. 

Pearic: 6 languages of Cambodia. 

Nico-Monic languages 

Nicobarese languages : 6 languages of the Nicobar Islands, a territory 
of India. 

Asli-Monic languages 

Aslian : 19 languages of peninsular Malaysia and Thailand. 

Moni c : 2 languages, the Mon language of Myanmar and the Nyahkur 
language of Thailand. 
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There are in addition several unclassified languages of southern 
China. 

Finally, Diffloth’s (2001, 2005) Austroasiatic language family tree 
with his tentaive calibration of time depths is given below. 

1000 AD 0 AD 1000 BC 2000BC 3000BC J000BC 5000BC 


Korku _ 

Kherwarian _ 

Kharia-Juang 

Koraput _ 

Khasian _ 

Pakanic _ 

Eastern Palaungjc 
Western Palaungjc 

Klmuic _ 

Vietic _ 

EastauKatuic 


Mimda 


Khasi-Khniric 


Weslan Katuic 


Western Bahnaric 

Northwestern Balinaric 


i 

Nortliem BtUinaric 

Caitral Balinaric 

— i_ 



Soutlieni Balinaric _] 


K1 in trie _ 

Pcaric _ 

Mot lie _ 

Noitlian Aslian 

Senoic _ 

Soutlieni Aslian 
Nicobarese _ 


Vieto-Katuic 


Klinsro-Vietic 


Khntro-Bahnaric 


Asli-Mmic 


Nko-Monie 


Mon-Klmtr 


Diagram 1: Diff loth’s (2001, 2005) Austroasiatic language family 
tree with his tentaive calibration of time depths. 

Austroasiatic Bibliography (mainly India-centric) 
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In the last two decades after ‘An Annotated Bibliography on 
Austroasiatic languages’ was published in 1989 by Nagaraja, 
and one by Arun Ghosh in 1988, some important publications 
have come out. The present listing is mainly restricted to that 
period, though some significant works of earlier decades also 
have been included for the benefit of those who have not seen 
that bibliography. 

Probably the most important work to come out so far is: 
Anderson, Gregory D.S. (ed.). 2008. The Munda languages. 
Routledge. 

This is the first work to provide a fairly detailed description 
of most of the Munda languages. After a general introduction 
which discusses about the languages and their subgroupings, 
descriptive accounts of various languages have been provided. 
The languages covered are: Santali, Mundari, including Kera 
Mundari, Ho and other Kherwarian languages; Korku, Gorum, 
Khairia, Juang, Remo, Gutob, Gta?, and On Nihali. The 
descriptions briefly cover phonology, morphology and Syntax, 
and a bit of lexicon and a sample data. 

Under Other Kherwarian languages small notes on the 
following varieties have been added.: Bhumij, Korwa, Asuri, 
Birhor, Turi. Except Ho and Bhumij, others are highly 
endangered varieties. Interestingly all of them are close to 
Mundari, even treated as dialects of Mundari by most scholars. 
But lack of adequate descriptions on them makes it difficult to 
decide on their status. 

All other works have been listed in alphabetical order. 

Anderson, Gregory D.S. 2000. ‘Split-Inflection in Auxiliary Verb 
Constructions.’ In: N.M. Antrim, G. Goodall et al. (eds.;. 
WE COL 1999. 

-. 2001. ‘A New Classification of South Munda: Evidence from 

Comparative Verb Morphology.’ Indian Linguistics 62: 21-36. 
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-. 2003. ‘Dravidian influence on Munda.’ International Journal 

of Dravidian Linguistics 32.1: 27-48. 

-. 2004. ‘Advances in Proto-Munda reconstruction.’ Mon-Khmer 

Studies 34: 159-184. 

-. 2006. Auxiliary Verb Constructions. Oxford: OUP. 

——. 2007. The Munda Verb, typological perspectives. Trends in 
Linguistics, Studies and Monographs 174, Berlin: Mouton de 
Gruyter, pps. xiv+306. 

-. 2008. ‘Gta?.’ In: Anderson, Gregory D.S. (ed.). 2008. The 

Munda languages. Routledge. 682-763. 
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CONSIDERATIONS ON THE HOMELAND OF 
AUSTROASIAT1C PEOPLE 

Keynote address 

Gerard Diffloth 
Siem Reap, Cambodia 

The notion of homeland of a linguistic family had a few 
decades ago a certain romantic appeal based on essentialist 
notion of language and culture, such notions have now been 
mostly discarded. 

However, the issue of homeland is interesting from 
archaeological, anthropological, ecological and genetic as well 
as linguistic points of view. To discuss borrowings, substrata, 
language shifts, areal diffusion, drift and typological mutations 
the issue of homeland is very important for linguists. 

There are two ways to work on the issue of homeland. One 
is historical linguistic method, comparison and reconstruction: 
comparison of the different related languages and branches 
within the language family and reconstructing lexical items 
and grammatical structures for the proto language. If it is 
known where the deepest historical division is- that is where 
the earliest branching was, it is easy to pin point the location as 
the original place of the people. Here the geographical 
distribution and clustering of the Austro-Asiatic languages is 
important. 

Another is to study place names- as toponyms connect 
language to land, and names of flora and fauna. This was 
demonstrated this by mentioning some wild animals like 
peacock which are found in Southeast Asia and India and their 
names which are reconstructed to the proto Austro-Asiatic 
stage. The habitat of these animals suggests that they can only 
live in the tropical, humid environment. Since the terms for 
these animals are reconstructible to the proto stage, it could be 
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speculated that the people who spoke the earliest form of the 
Austro-Asiatic could be living in such an area which is in 
tropical zone. This rules out the idea of homeland of Austro- 
Asiatic in China. 

Geographic clustering, in the case of Austro-Asiatic 
does not yield immediate results, but a second reading of the 
evidence could confirm a southern origin, more specifically on 
the shores of the Bay of Bengal on either side of the divide 
between India and southeast Asia. 



EMPOWEMENT OF TRIBAL COMMUNITIES 

Inaugural Speech 


Ram Dayal Munda 
Ranchi, India 


In his inaugural speech Prof. Munda addressed the issues of 
exploitation of the communities speaking Austro-Asiatic 
languages both in India and South East Asia, education as a 
means for empowerment and linguistic studies of Austro- 
Asiatic languages. 

The natural resources are located where the Austro-Asiatic 
communities live. The communities are exploited by the state 
in name of making use of the natural resources. Therefore, 
there is need for empowerment of these mainly marginalized 
communities. This is possible when education is available in 
their mother languages, and when their culture and literature 
are part of the school curriculum. He applauded the role of 
Central Institute for Indian Languages, Mysore in publishing 
linguistic studies and educational material on Austro-Asiatic 
languages. 

Congratulating the present linguists working on Austro- 
Asiatic languages he emphasized on the functional study as 
well as structural one. It is equally important to know how the 
linguistic structures are used in various situations by the 
speakers. For example how language is used when a Munda 
speaker is worshipping and invoking the deity. Socio-cultural 
aspects should be included in linguistic studies. 

He appealed to the researchers present to help the 
communities to be empowered in order to preserve their 
culture, languages and identities and subsequently the cultural 
and linguistic diversity in the age of globalization. 
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ABSTRACT 

Just over a century since the foundational studies of 
Wilhelm Schmidt, and the extraordinary 
effervescence that the field enjoyed in the 1960s and 
1970s, we have again reached an important watershed 
in the history of Mon-Khmer (MK) comparative- 
historical studies. 2006 saw the publication of Shorto's 
A Mon-Khmer Comparative Dictionary (MKCD), a 
wealth of extensive descriptive and lexical data is now 
available, and the International Conference of 
Austroasiatic Linguistics (ICAAL) meetings are 
resuming after a hiatus of three-decades. At this 
historic juncture it is appropriate to pause and reflect 
on this history of comparative MK studies, from its 
foundations in philological and neogrammarian 
methods and principles, through the field-work driven 
structuralist-descriptive phase that has characterized 
the second half of the 20 lh century. I analyze the 
stiengths and weaknesses of this work, drawing 
attention to programmatic aspects of the approaches 
taken by various scholars, especially contrasting their 
use of comparative and philological methods in 
historical linguistic reconstruction. Progress in 
comparative MK has always come in fits and starts, 
yet prospects have never been better as we launch into 
this new era with more data, new tools, and a clarity 
of purpose gained from hindsight. 
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1. Introduction 

The MK languages form the oldest and most diverse language 
family of Mainland Southeast Asia. MK speakers colonized all 
available ecological niches the region has to offer, from 
mountain slopes to tropical islands. Along the way they 
diverged widely; some barely subsisted in near-stone-age 
conditions, while others built great civilizations still recognized 
as hallmark achievements of mankind. Enriched by their 
complex history of contact with other Asian societies, the MK 
languages present a deeply evocative record of the region that 
can be revealed by comparative-historical analysis. But the 
comparative study of languages does not proceed easily. 
Scholars must have access to extensive data of suitable quality. 
They need to understand disparate writing systems and diverse 
historical and cultural contexts, and must learn to recognize the 
influences of numerous contact languages. The work, 
particularly in the pre-computer era, is excruciatingly time 
consuming, so that investigators must have strong institutional 
support or extensive private means. And since no one 
researcher can be an authority on all the languages., there must 
be qualified collaborators willing to offer data and technical 
advice. The larger and older the language family, the more 
daunting these tasks become. This has certainly been the case 
in regard to the MK languages, where instead of a century of 
incremental progress we have seen fits and starts, great leaps, 
dead ends, and numerous frustrations. The 20th century 
certainly began well with the grand synthesis of comparative 
MK attempted by Schmidt and discussed below, yet his work 
suffered from many of these limitations, and no comparable 
effort would appear within the century. Instead, as time went 
on, scholars focused mainly on specific languages or sub¬ 
groupings, and rarely considered them in the broader MK 
historical context. Genuine cooperation among specialists has 
been uncommon; indeed the story of comparative MK research 
over the past a hundred years and more has been one of highly 
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motivated, but usually solitary, individuals. Although they 
achieved much—often at great personal sacrifice—they also 
faced barriers that hampered progress, and occasionally made 
serious errors that reduced the usefulness of their results. 

2. End of the 19 th and Beginning of 20' h Centuries 

The main factor leading to the birth of comparative MK studies 
was the European colonization of SEAsia, as scholars began to 
have access to increasingly reliable lexical data developed by 
colonial authorities, businessmen, and missionaries. At first 
these sources chiefly produced lexicons useful for translation 
in support of administration, trading, and prostelyzation, then 
true dictionaries of languages with written traditions (Mon, 
Khmer and Vietnamese 2 ) gradually became available. Since the 
mid 1800's various authors had begun to recognize the genetic 
relations between languages that we now understand constitute 
the MK family. As improved sources became available the 
picture gradually became clearer, and the possibility of 
comparative-historical reconstruction became real, 
crystallizing around 1900. Among the most important of the 
early dictionaries and lexicons were: 

• Cambodian (Aymonier 1874, Feer 1877, Moura 1878 etc.) 

• Mon (Haswell 1874, Stevens 1896 etc.) 

• Nicobarese (Man 1889) 

• Palaung (Scott & Hardiman 1900) 

• Bahnar (Dourisboure 1889) 

• Stieng (Azemar 1886) 

• Various Aslian (Morgan 1885; Blagden 1894 etc.) 

• Numerous lists compiled during the Pavie Expeditions into 
Indo-China 1879-1895. 

We are not surprised that the beginning of the 20th century 
saw a great burst of activity that placed the comparative 
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understanding of the MK family on a solid footing. Particularly 
noteworthy were: 

• a series of studies by Schmidt (1901. 1904, 1905, 1906) 

• a massive Aslian compilation by Skeat and Blagden (1906) 

• a broad, well organized comparative vocabulary by Cabaton 
(1906) 

• the Linguistic Survey of India by Grierson (11 vols. from 1903 

to 1928) 

The most important of these from the analytical perspective 
were the works of Schmidt. But before going further, though, it 
is useful to discuss the use and scope of the term "Mon- 
Khmer" in this context. Since die first identification had been 
made by Mason in 1854, it was apparent that the Munda 
languages of India could and should group with Mon, Khmer 
etc. in an Austroasiatic super-family. But to Schmidt and 
others of his time the term Mon-Khmer was only used to 
designate Mon, Khmer, and other languages that appeared to 
be self-evidently close to them, such as Bahnar and Stieng. 
When groups like Palaungic and Aslian were subsequently 
identified as related, they were not necessarily classified as 
MK. Rather; they were variously treated as forming other 
distinct Austroasiatic branches. Indeed, it was only in the late 
1960s that the now-received model, in which all of the non- 
Munda languages belong to one large MK branch, really 
emerged. Throughout this paper the term MK is used to refer 
to the generally received notion of all Austroasiatic languages 
other than those of the Munda branch(es). 

3. Pater Wilhelm Schmidt 

The German comparativist and ethnologist Pater Wilhelm 
Schmidt (1868-1954) effectively established the field of 
comparative MK studies with a series of four major 
publications at the beginning of the twentieth century (1901, 
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1904, 1905, 1906). Taken as a whole these four works form a 
more or less coherent account of the MK family as it was 
known at the time. The first of these (1901) is a monograph 
length (142 pages) paper that examines the Asliari languages of 
Malaya, demonstrating that they are genetically related to other 
MK tongues. The study is substantially lexical, including a 
vocabulary of 1232 entries, plus some samples of texts. The 
languages are classified into two major subgroups using 
distinctive vocabulary tests: Semang (Northern Aslian) and 
Sakei (a Southern group with further sub-divisions). Schmidt 
also achieves a successful morphological analysis. Schmidt 
(1904) discusses Khasi and Palaungic, correctly identifying the 
latter as an important subgroup. Much of the treatment of 
Khasi is an extensive morphological analysis that identifies 
numerous prefixes and infixes, seeking to provide a 
comprehensive account of word formation. Unfortunately 
Schmidt's analysis goes somewhat too far, since he decided 
that all Khasi sesquisyllables were formed by affixation of 
monosyllabic roots. He was partly stimulated to this view by 
the fact that there is a phonological tendency for the elision of 
initial consonants from clusters, so that it was not clear to 
Schmidt that in some cases he was looking at secondary 
monosyllables rather than etymological monosyllables. 
Schmidt then took his idea further, and suggested that the 
model be applied to MK generally. This seductively simple but 
ultimately mistaken model would influence some scholars right 
up to recent times, such is Schmidt's reputation. 3 This idea 
would also influence his (1906) attempt to relate MK and 
Austronesian, since it gave some formal bases for comparing 
word-forms. 

The 1904 paper also prefigures to some extent the methods 
used in his 1905 opus. Schmidt directly compares Khasi 
vocabulary to Mon, Khmer, Bahnar and Stieng in an attempt to 
justify various word-formational formulae. In retrospect, not 
all the comparisons are valid, or in some cases the 
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interpretations are simply too bold. While the underlying 
notion of using systematic comparative data to analyze Khasi 
morphology is a good one, Schmidt was not quite in a position 
to fully appreciate the significance of the data he had before 
him. 

The third paper (Schmidt 1905, often simply referred to as 
the Grundzuge [roughly, ’Foundations'] from the title) is an 
extensive comparative treatment of Mon, Khmer, Stieng and 
Bahnar. In it Schmidt set out to establish regular sound 
correspondences on the basis of an extensive and fairly reliable 
data set. To make this study Schmidt used Written Mon and 
Written Khmer, assuming that their Indic-based spellings 
faithfully recorded historical phonetic values, plus 
contemporary lexicons of Bahnar and Stieng (Dourisbourne 
1889 and Azemar 1886, each recorded in Latin script). In 
choosing to work with the two languages that had long written 
traditions Schmidt followed well-established philological 
methods, as the comparative investigation of Indo-European 
had proceeded principally by using Latin, Greek and Sanskrit. 
He was also correct in assuming that the writing systems of 
these languages would be conservative, since both Mon and 
Khmer underwent extensive phonological restructuring after 
their writing conventions became established. On the other 
hand, in some respects Schmidt erred in following the Indie 
spellings too slavishly, as they include some non-etymological 
and non-phonetic information, such as use of final voiced stops 
and retroflex consonants. Yet Schmidt was also well aware of 
Pali/Sanskrit loans in Mon and Khmer, and discussed many of 
them. He also concluded (wrongly it turns out) that some MK 
etyma must be very ancient Indie loans (such as PMK 'water' 
*daak < Sanskrit udaka). Notwithstanding these problems, 
Schmidt successfully determined the basic outline of Proto-MK 
consonantism. He was also able to offer morphological 
analyses that related affixes to various phonetic changes, and 
he established the importance of the patterning of segmental 
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collocations within rhymes—insights that have been the 
foundation of successful phonological reconstruction of MK 
languages to this day. 

Schmidt encountered his most serious difficulty in his 
treatment of the vowels. The biggest problem was his handling 
of long and short /a/. These are the most common vowels, and 
occur in some 45% of his comparisons (and, generally, in 
around one-third of all MK etyma). Schmidt was apparently 
unaware that Mon and Khmer use similar spellings to represent 
different vowel phonemes, and consequently he erred in 
consistently transcribing /a/ and /a:/ according to Indie 
readings, losing their etymological values, and fundamentally 
undermining his analysis of the historical systems. The 
apparent contusion of vowel correspondences created such 
difficulty for Schmidt that he was only able to posit phonetic 
equations, and not proper vowel reconstructions. Despite its 
various difficulties, the Grundziige, with more than 900 MK 
lexical comparisons, reconstruction of proto-consonantism, and 
morphological analyses, laid the foundation for all subsequent 
comparative work. It is most obviously evident in Shorto's 
(2006) posthumous MK comparative dictionary, where it is 
clear that the Grundziige provided the skeleton upon which 
Shorto more-or-less directly built his edifice. Schmidt's 
ultimate work in this series was his (1906) attempt to link MK 
and Austronesian within a grand Austric macro-family. This 
bold hypothesis remains controversial. Although it faces 
serious difficulties in the light of Shutter and Marck's (1975) 
now generally accepted formulation of a Formosan homeland 
for Austronesian, 4 it still competes with other models of deep 
genetic relationship among the various language families of 
Asia (see Sagart et al. 2005 for recent papers and discussion). 
Schmidt’s evidence was both lexical (more than 200 
comparisons) and morphological (including parallels in 
prefixes and infixes), and has found a small and enthusiastic 
following, including Shorto, and nowadays notably Laurence 
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Reid (e.g. 1994, 1996, 2005). The most concrete results to 
come out of Schmidt (1906) is the discussion of the internal 
classification of MK languages. It posits three branches as 
follows: 

1. a) Semang 

b) Senoi 

2. a) Khasi 

b) Nicobarese 

c) Wa, Palaung, Riang 

3. a) Mon-Khmer (Mon, Khmer, Bahnar, Stieng, etc.) 

b) Munda 

c) Cham, Rade, Sedang 

Group 1 represents the Aslian languages of Malaysia. In group 
2, entries a) and c) are Northern MK in today's formulation, 
with b) Nicobarese being anomalous in that group. In 
retrospect, group 3 is the most heterogeneous: Munda is 
arguably separate from all of the languages above, occupying a 
different Austroasiatic branch, while Cham and Rade are 
nowadays recognized as Austronesian, although with a large 
borrowed MK component to their lexicon. Schmidt also 
included Sedang in this group, a Bahnaric language that has 
enjoyed considerable Cham and Khmer influence (although 
perhaps not as must as Bahnar itself). 

The above scheme is not as strange as it might first appear 
to modern eyes. Removing the most serious anomalies in 3 a) 
and b) we are left with three main branches that resemble in 
many important respects the three-branch classification of 
Diffloth (1989 and passim.) that now has a more-or-less 
received status among specialists. The essential difference is 
that Diffloth would place Nicobarese and Mon into the Group 
1 (or Southern) branch alongside Semang and Senoi (Aslian). 
Neither Diffloth nor any other writer has, to my knowledge, 
offered in print a detailed proof of this classification (or 
disproof of any other) based upon a comprehensive 
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comparative reconstruction of the breakup of Proto-MK. 
Indeed, the most important works of the last century on the 
issue of MK classification have been based upon 
lexicostatistics, a method that is generally recognized as only 
suitable for preliminary investigations. 

4. Other Early 20 th Century Works 

Contemporary with Schmidt was Skeat and Blagden’s (1906) 
massive comparative Aslian lexicon. It was richly annotated 
with wider MK comparisons, letting it serve effectively as an 
etymological dictionary. It is as a useful complement to 
Schmidt s work, and has been relied upon by comparativists in 
the decades since. However, Skeat and Blagden did not use 
their lexical materials within the framework of comparative 
reconstruction or the genetic theory of language as we know it. 
Instead, they focused on what they saw as lexical differences, 
for example a list of 50 Semang words they saw as having no 
MK parallels, arguing for waves of migration causing partial 
language shifts among earlier Negrito (non-MK) populations 
on the peninsula. 

Notwithstanding their diffusionist views, Skeat and Blagden 
did advance our understanding of Aslian beyond the first 
simple classification offered by Schmidt (1901), They fleshed 
out the basic North-South divisions, employing both lexical 
and phonological criteria. Their scheme was later improved by 
Schebesta's (1926) study, which established the essential 
classification that is used until today (for a further discussions 
see Benjamin 1976 and Matisoff 2003). 

Another important resource that became available during 
this highly productive period was the Linguistic Survey of 
India, with Volume 2 "Mon-Khmer and Siamese-Chinese 
families" and Volume 4 "Munda and Dravidian" (Grierson 
1904, 1906 respectively) giving useful lexicons and descriptive 
information. Contemporaneously, a well-organized data 
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collection that proved influential was Cabaton's 1905 
presentation. In this paper he gives some 416 cognate sets for 
28 languages, including many small languages of the Annamite 
Range for which no better sources would be available until the 
second half of the 20th. century. In addition to his analysis of 
lexical data, Cabaton discusses the phonological systems of 
Cham, Khmer, Malay, Bahnar, Chrau and Stieng. 

However, despite ever-increasing access to data, and a 
solid foundation analysis, comparative MK studies fell into a 
prolonged lull. The central difficulty was the lack of a critical 
mass of scholars interested in pursuing comparative studies in 
Southeast Asia. 

At the time diffusionist views were common (pace Skeat & 
Blagden 1906) as the wave model of linguistic change, 
popularized by Johannes Schmidt (1872 etc.), seriously 
challenged the genetic theory of the Leipzig neogrammarians 
(a rivalry that periodically resurfaces among scholars who 
doubt the reality of language families as objects of study). In 
this context it is not surprising that Wilhelm Schmidt's work 
attracted severe criticism from some quarters; here is Thomas' 
tidy summation: 

Georges Maspero was very skeptical of the work of Schmidt, 
and of Schmidt's immediate predecessors. His works on Thai 
(1911) 5 , Vietnamese and Thai (1920) 6 and Khmer (1915), 
made his position quite plain. He roundly condemned the 
'Mon-AnnanV family of Logan, Forbes, Muller, and Kuhn, 
which included Mon, Khmer, Vietnamese, Cham, etc. 
Maspero granted Mon-Khmer and Palaungic as being related, 
but nothing else certainly related. 

He felt that the work of Skeat and Blagden had removed 
the Senoi and perhaps also the Semang from Austroasiatic; if 
Grierson's report was correct that Khasi had tones, then 
K'nasi was automatically eliminated; and he felt Munda's 
relationship to Austroasiatic still unproved despite the efforts 
of Schmidt and Konow, Maspero's skepticism was healthy, 
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as the proofs that had been adduced previously were far 
from being full proofs. 

His denial of Khasi, however, came from his a priori 
supposition (based on his assertion of a Thai—Vietnamese 
relationship) that Mon-Khmer languages cannot have tones. 
(Thomas 1964:154-155) 

With an authority of Maspero's stature to contend with, it 
would take a brave scholar indeed to plow ahead with overt 
comparative work within the framework that had been 
established by Schmidt. Quite predictably, specialists and 
commentators began to line up against the Austroasiatic 
hypothesis in its various forms. Perhaps the most important of 
these was an influential 1942 paper by Sebeok. In it, Sebeok 
asked "But is there an Austroasiatic sub-group at all?" (p.211), 
then responded to his own rhetorical question by citing the 
negative views of Maspero, whom he characterized as "the best 
authority" on the languages. 

4.1. Two transitional figures 

Despite these negative views a solid line of inquiry based on 
the epigraphic/philological tradition continued at SOAS 
(London). It was this stream that would keep comparative MK 
studies alive until an entirely new tradition, based upon 
fieldwork on living languages, emerged in the 1960s. The key 
transitional figures were Charles Otto Blagden (1864-1948), 
and Gordon Luce (1889-1979). 

Working at SOAS well into the 20 th century as Professor of 
Malay, Blagden had moved on from his earlier work on Aslian 
to develop a considerable interest in Mon, and had prepared a 
preliminary etymological dictionary on index cards by 1928. 
Never completed, Blagden's notes informed various research 
publications, and five publications (four fascicles and a volume 
of plates) of the Epigraphia Birmanica between 1923 and 1936. 
These notes were eventually passed on to Luce. 7 
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An indefatigable field worker, Luce compiled thousands of 
pages of notes and wordlists for the languages of Burma. H Luce 
clearly meant to carry through Blagden's dreamt of a Mon 
etymological dictionary. He expanded and corrected Blagden's 
corpus of transcribed Mon texts, developed a set of lexical 
comparisons, and also improved the morphological analyses of 
the language. Yet Luce's principal linguistic passion was for 
Burmese and its relatives, so he ultimately passed on the torch 
of Mon studies to Shorto, who met the challenge admirably. 

It is important to consider that despite Luce's enthusiasm 
for field data, his historical approach was fundamentally 
philological, favoring languages with long written traditions 
(as well as architecture and material culture) for their capacity 
to offer direct witness to history. This comes through strongly 
in his most important publications, including the three-volume 
Old Burma-Early Pagan (1969-70), the posthumously 
published two-volume Paris lectures Phases of Pre-Pagan 
Burma (1985) and his A Comparative Word List of Old 
Burmese, Chinese and Tibetan (1981) (the latter having grown 
out of research for his extensive unpublished draft dictionary 
of Old Burmese). The role of minor living languages in Luce's 
method was always a supporting one, helping to clarify and 
contextualize the histories of the major languages and the 
cultures they reflect. 

5 . The Rebirth of Comparative Mon-Khmer Linguistics 

The mid-20 th century saw a rebirth of interest in comparative 
MK studies. Haudricourt (1952, 1953, 1954) breathed new life 
into the field with his proof that Vietnamese tones could be 
explained by direct reference to MK etymology, rather than to 
Thai or Chinese. Haudricourt is the real hero of the era; his 
studies produced a powerful demonstration of the insight into 
phonology and proto-history that could be offered by the 
comparative method. Maspero's Chinese Wall of doubt 
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tumbled, setting the stage for the return of comparative MK 
reconstruction to academic respectability. 

^ there was still a major hurdle that would not be 
conquered in the 1950's. In the vast linguistic hotbed of 
Mainland Southeast Asia, language groups were not 
necessarily self-evident, and the state of classification was 
dismal. Aslian and Palaungic were recognized as distinct 
complex branches, but their real extents were unknown. Nor 
was there any clear understanding that the scores of minor 
languages found in the region that stretches from central 
Cambodia through Isaan and along the Annamite range, up into 
Yunnan and Sipsongbanna, would eventually be resolved into 
Bahnaric, Katuic, Pearic, Vietic, and Khmuic. Absent a real 
delineation of sub-groupings comparative studies floundered. 

A way out of the morass came with the application of 
lexicostatistical methods. Although roundly condemned in 
many quarters as invalid (especially since Bergsland & Vogt 
1962; see also ten pages of vicious denunciation in Campbell 
1998), lexicostatistics has enjoyed considerable success in 
Southeast Asia as a useful heuristic classification. 

Preliminary private studies were followed in print by 
Thomas (1966) and Thomas and Headley (1970). These 
effectively established the pattern of MK sub-groupings, 
supported by further studies of Huffman (1976) and Smith 
(1981). The major early result was that the Bahnaric and 
Katuic families, which together account for perhaps forty 
percent of the diversity among MK languages, were correctly 
delineated, as were some of their most important sub¬ 
groupings. 

1959 was an especially good year for comparative studies 
with the appearance of Heinz-Jtirgen Pinnow's Versuch einer 
historischen Lautlehre der Kharia-Sprache , as well as the 
publication of the first volume of the huge, multi-dialectal 
Bahnar dictionary of Guilleminet and Alberty (1959, 1963). 
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Pinnow's understated title suggests a tentative historical 
phonology of Kharia, a Munda language of India. In a more 
accurate view, in 514 pages of dense text Pinnow attempted to 
build nothing less than an Austroasiatic etymological 
dictionary, with some 550 etyma over a representative set of 
languages. Issues of Kharian phonological evolution were 
handled in the context of a preliminary Proto-Munda and 
Proto-Austroasiatic reconstruction. More than 400 cognate sets 
supported the proto-vocalism, and more than 500 supported the 
consonantism. 

Pinnow made extensive MK comparisons, with the 
implication that regular Munda: MK correspondences would 
reliably establish and reconstruct ancient etyma within the MK 
family. But Pinnow's results were hampered by a lack of data, 
and by problems of interpreting sources. His book's ultimate 
impact on the field was not nearly commensurate with the 
effort that had gone into it (see the witty review by Shafer 
1960). 

Two other major comparative MK studies also appear at 
this time: Shafer's Etudes sur l’Austroasian/Studies in 
Austroasian in 2 parts (1952 and 1965). But like Pinnow's 
work, these studies did not deliver extensive or useful Proto- 
MK reconstructions, and thus really did not advance things 
much beyond the pioneering framework advanced by Schmidt 
a half a century before. 

Nevertheless, the spark had been lit, and as the 1960s and 
70s progressed MK research blossomed. A generation of 
young scholars (many associated with the American based 
Summer Institute of Linguistics ) traveled to Indo-China, 
collected data on many previously little-known and 
undocumented languages, and began preparing various sub¬ 
group-level reconstructions. Significantly many of this new 
generation were schooled in the Bloomfieldian structural- 
anthropological approach to linguistics that emerged so 
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strongly, especially in the English speaking countries, post 
Bloomfield (1930). Among the SIL tendency this morphed into 
the tagmemic tradition developed by Kenneth Pike, such that a 
significant stream of SEAsia orientated field linguistics were 
by-passed by the Chomskyan revolution and its underlying 
hostility to empirical and comparative-historical methods. 
Showing extraordinary foresight, in 1964 this same tendency 
founded the journal Mon-Khmer Studies in Saigon (nowadays 
based at Mahidol University, Thailand). 

European interest was also rekindled at this time. For 
example Michel Ferlus (CNRS, France) began his decades- 
long commitment to data collection and historical analysis of 
various languages of Thailand, Laos and Vietnam. 
Scandinavian scholars such as Kristina Lindell, Jprgen Rischel, 
Sbren Egerod became active, working mainly on languages 
accessible from Northern Thailand. And in Thailand among 
local linguistics an admirable tradition of descriptive studies 
developed, particularly at Mahidol, Chulalongkorn and 
Thammasat Universities. 

Yet it was among the young American trained field 
workers that an important methodological shift in comparative 
studies begins to become apparent. Philological methods were 
effectively eschewed in favor of a purely field data based 
approach to comparative reconstruction. Not just driven by the 
increased availability of field data, it reflected the underlying 
Bloomfieldian biases of the researchers, becoming an explicit 
programmatic commitment most highly developed in the work 
of Gerard Diffloth through the 1980s (discussed below). Yet 
others would continue with a more European approach to 
historical linguistics, seeking to get the most from both old and 
new methods, not necessarily ruling out any particular 
categories of data (especially, e.g. Michel Ferlus). 

The surge in worldwide interest in MK and Austroasiatic 
linguistics stimulated a major international conference at the 
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University of Hawai'i in 1973 (the first ICAAL,); out of which 
a substantial two-volume set of proceedings was published in 
1976 (Jenner et al.). A second ICAAL was held at Central 
Institute of Indian Languages in Mysore in 1978. 

However, the times were both a highpoint and a turning 
point for the field. The dramatic political changes in SEAsia 
after 1975 saw Western field workers locked out of Indo-China, 
while at home there was a general decline in support for 
academic work related to the region. There would be no more 
MK/Austroasiatic themed conferences and instead such work 
as continued was presented at meetings primarily devoted to 
other language families, or at conferences of a general 
SEAsian theme. The tendency for collaborative work also 
declined, evidenced in the lack of jointly authored comparative 
studies between the late 1970s and 1990s. 

Thus the comparative work that continued to be published 
after this time, although more developed, was the product of a 
narrowing research community, one which lacked an 
organizational or programmatic unity. 

6. Summary Review of Mon-Khmer Comparative-Historical 

Reconstruction in the second Half of the 2Ct h Century 

In the pages that immediately follow I review branch by branch 
the development of comparative-historical studies, covering as 
far as possible the whole MK family from the mid-20th century 
up to more or less the present day. The survey is reasonably 
comprehensive, although I do not pretend that it is complete. 

6.1. Bahnaric 

The Bahnaric family is arguably the most extensive and 
diverse sub-division of MK, with more than 30 distinct 
languages spread over three branches. 9 Researchers associated 
with SIL began comparative work in the mid-1960s, and by the 
1970s well-respected studies appeared in print. However, 
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despite continued interest and a substantial body of emerging 
literature, only subgroup reconstructions have been published. 
No adequate Proto-Bahnaric reconstruction has been presented, 
despite the efforts of many researchers (including the present 
writer). Proto-Bahnaric is likely to be an especially difficult 
nut to crack: the family is split into several distinct language 
contact areas of great antiquity, and its multiple layers of 
borrowed vocabulary do not easily yield to analysis. 

Blood (1966) 

Henry Blood produced a reconstruction of Proto-Mnong and a 
preliminary Proto-South Bahnaric with his 1966 MA thesis 
(SIL has re-released this work, and it is sometimes dated to 
1968 or 1976). 

Ostensibly Blood's study is an attempt to deal with the 
issues raised by Thomas(1964), drawing our attention to the 
following quote: 

Still another difficulty, one that besets all Mon-Khmer 
comparativists, is the complexity of the vowel shifting that 
has taken place in Mon-Khmer, making it very difficult to 
establish regular patterns. Schmidt, after a careful survey 
of the situation, had to content himself with just statements 
about possible general trends, establishing no sound-laws. 
Other comparativists have stated flatly that regular sound- 
laws simply do not exist in Mon-Khmer vowels, and, 
indeed, no one has yet succeeded (in print, anyway) in 
establishing a regular pattern in Mon-Khmer vowel 
comparisons. (Thomas 1964:160-161) 

Using data from a half-dozen Mnong dialects, Blood 
assembled 428 cognate sets, then matched these with 
comparisons from Chrau, Stieng and Sre where available. 

Blood's reconstruction proved the possibil ity of establishing 
regular vowel correspondences, provided the data are recorded 
in accurate phonemic (or at least broad phonetic) notation, and 
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the sub-groupings well understood. This was done by direct 
field methods, using data collected by himself and trusted 
colleagues, in stark contrast to earlier studies by Shafer, 
Schmidt, Pinnow, and others who worked from secondary 
sources. Thus, Blood's great contribution went well beyond his 
own specialty: his publication helped to launch a new tradition 
of empiricism in Southeast Asian comparative linguistics that 
set the agenda for the next 40 years. 

Thomas & Smith (1967) 

David Thomas and Marilyn Smith also explicitly took on the 
problem of vowel reconstruction in MK languages. They 
attempted a very low-level reconstruction based on a 
straightforward binary comparison of two North Bahnaric 
languages that are close enough to form a self-evident sub¬ 
grouping: Jeh-Halang. In their own words: 

... the study is significant in that it demonstrates the 
possibility of reconstruction of vowels in Mon-Khmer 
language comparisons, provided one starts with languages 
closely enough related and has data on dialect trends; ... 
(Thomas & Smith 1967:175) 

However, I would rather see the significance of the study in the 
fact that the investigators continued the pattern begun by 
Henry Blood. Thomas and Smith both had direct experience 
working with related languages, and for their study used recent 
field data collected by trusted colleagues in circumstances that 
they understood. 

The actual phonological reconstruction offered in the paper 
is not so significant. It does not address any of the real 
problems of North Bahnaric phonology, in particular the nature 
of the register system. The methodology employed simply 
posited no vowel change when forms were in agreement, and 
assigned an arbitrary form when they were not, without 
reference to a general theory of vocalic development. 
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Smith (1967/1972) 

At the same time that David Thomas and Marilyn Smith were 
collaborating, Marylin's husband Kenneth was working on 
Sedang dialects. In addition to much fine descriptive work, 
Smith produced a draft reconstruction under the initial title A 
Phonological Reconstruction of Proto Central North Bahnaric 
(a carbon copy is available for inspection at the David Thomas 
Library, Bangkok), and later (1972) published a revised 
version as a short monograph in the SIL Language Data series 
as A Phonological Reconstruction of Proto-North-Bahnaric. 

The study combines the fieldwork-driven empiricism of the 
SIL milieu of the day with a proper, recognizable, comparative 
reconstruction methodology. Data are regularized into 
comparable phonemic representations, languages are 
considered within a theory of their genetic relations, and a 
serious theory of the evolution of the vowel systems is 
elaborated within the context of proper distributional 
statements of segmental collocations. Formal correspondences 
are established for Bahnar, Proto-Jeh-Halang, Proto-Hre- 
Sedang, Hre, and internally reconstructed Early-Sedang. 

Smith explicitly elaborated the key methodological 
breakthrough that set the stage for solving the otherwise 
intractable vowel problem that had bedeviled comparative MK 
studies since the time of Schmidt: 

It has been found both here and in other studies of the 
Mon-Khmer languages of South-Vietnam that the 
vowels are difficult to determine apart from the final 
consonant. For this reason many analysts construct 
"rhyming dictionaries" during their initial phonological 
study. Some final consonants occur with only a limited 
number of vowels; few, if any, occur with all vowels 
(Thomas 1966). In this paper all reconstructed vowels 
and final consonants are brought together so that the 
overall effect on the vowels by a given final consonant 
can be clearly seen. (Smith 1972:4) 
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As Smith notes, SIL field investigators tasked with creating 
dictionaries and phonemic writing systems had turned to the 
Tang Dynasty Chinese innovation of the rhyming dictionary, 
which solves the problem of the distributional collocations of 
vowels and finals. 

Smith extended the application of these methods from 
synchronic description to comparative-historical analysis, and 
thus demonstrated in principle and practice a solution to the 
MK "vowel problem". Concretely, Smith presented 571 
etymologies and proto-forms, listed in blocks according to 
their rhyme correspondences. 

The reconstruction is, thus, completely transparent, with the 
regularity and reliability of all correspondences immediately 
accessible and open to challenge. This is Smith's achievement, 
and is the basis of his enduring contribution to comparative 
MK studies: not the specific results of his reconstructions, but 
rather his methodological exemplar. 

Interestingly Smith erred in including Bahnar within North 
Bahnaric, a possibility that he rightly acknowledged himself 
(on page 11). And in this respect Smith made the most serious 
mistake in his study. In treating Proto-North-Bahnaric as a 
register language (probably correct), he assumed that Bahnar 
lacked register because the feature had been lost. He did not 
consider the complementary possibility that register was an 
innovation of the North Bahnaric group. But this error can be 
readily forgiven. Unlike Khmer, Mon, West Katuic, etc., the 
North Bahnaric registers do not have a transparent correlation 
with initial voicing, and thus appear to be archaic in the 
absence of another explanation for their origin. 

The 1970s and 1980s 

After Smith's (1967/1972) breakthrough study, Gregerson, 
Smith & Thomas (1976) offered a substantial paper looking 
into the place of Bahnar within Bahnaric, coming to the 
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conclusion that Bahnar belongs in a Central Bahnaric sub¬ 
group, distinct from either North Bahnaric or South Bahnaric. 
In that same paper Gregerson, Smith & Thomas list a 
bibliographical reference to a draft PhD A Reconstruction of 
Proto-South-Bahnaric by Richard Phillips (nodate, and thus far 
unseen by this writer). 

The SIL archives hold several minor unpublished 
comparative Bahnaric studies from this period. Sheldon's 
(1979) 62-page essay A Reconstruction of Proto-South 
Bahnaric applies basic computational techniques to the sorting 
of 224 cognate sets. Leitch's (1981) 32-page essay essentially 
fits Rengao data into the framework established by Smith 
(1967/1972), listing Rengao cognates for 561 sets. 

Efimov (1990) 

Efimov (1990, in Russian) is a monograph-length 
reconstruction of Proto-South Bahnaric that presents 675 
etymologies, including some discussion of wider MK 
comparisons. The most remarkable fact concerning this study 
is that it is completely out of character with his earlier 
dissertation on Proto-Katuic (1983), which was more or less a 
model of reasonable reconstruction methodology. 

For Proto-South-Bahnaric Efimov posits a complicated 
historical phonology, unlike anything attested in the known 
typology of the daughter languages. The result is a proto¬ 
vocalism of 31 members characterized by four degrees of 
aperture, and an overly large consonantal inventory. This is 
achieved by ascribing a distinct proto-phoneme to each 
phonological correspondence without attempting to make 
distributional analyses of their complementary patterning, 
more or less as Peiros (1996) would later do for Proto-Katuic. 
Similarly Efimov reconstructs no final velar stop, instead all 
examples of-* are derived from *-?, also seen in Peiros (1996). 
Peiros is named as one of the editorial board who approved the 
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publication, and would seem to have exercised some influence 
in its writing. 

Phraya Prachakij-karacak (1995) 

This is a translated and annotated edition of a study originally 
published in Thai. It includes wordlists for various West and 
Central Bahnaric languages, and for decades was the only 
published source for some of these. The translators, David 
Thomas and Sophana Srichampa, transcribe the wordlists from 
Thai characters into IPA, and offer several hundred 
impressionistic Proto-West-Bahnaric reconstructions. The 
volume makes an important historiographic contribution, but 
the data is limited in its usefulness by a lack of certainty as to 
the phonetic representations. 

Sidwell (1998) 

Sidwell's PhD thesis (University of Melbourne, supervisor Ilia 
Peiros) is a reconstruction of Proto-Bahnaric based on a 
comparison of five criterion languages: Sedang, Jeh, Bahnar, 
Stieng and Chrau. 780 etymologies and reconstructions are 
compiled. Data from up to 25 other Bahnaric languages and 
nine other MK branches are included in the comparative 
lexicon, but not used in the formal reconstruction. 

Methodologically the thesis attempts to make a very 
specific point concerning the combined application of internal 
and comparative reconstruction methods, using Bahnaric data 
for the demonstration. The criterion languages are first 
subjected to internal reconstruction to produce strictly 
morphophonemic representations of the data, and are then 
subjected to comparative reconstruction. 

In retrospect the thesis was not successful: the model of 
word-structure did not allow for correct analysis of initial 
clusters, the internal analysis of vocalism conflated a number 
of vowel oppositions, and it was not recognized that the chosen 
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criterion languages had all been subject to strong Chamic 
influence (which may have been avoided if West Bahnaric data 
had be analyzed). In retrospect these weaknesses may have 
been minimized if a broad consultation with relevant scholars 
had been pursued, rather than narrowly sticking to advice from 
one principal supervisor. 

Sidwell (2000) 

The South Bahnaric component of Sidwell's PhD thesis (1998) 
was extracted, augmented and reanalyzed to provide the text 
for a monograph-length study, Proto-South-Bahnaric: a 
reconstruction of a Mon-Khmer language of Indo-China, 
published by Pacific Linguistics. 

The study compares internally reconstructed Stieng, Chrau 
and Sre (Koho) with Central Mnong, East Mnong, and Ma, 
compiling 829 etymologies and reconstructions. It is somewhat 
more successful than Sidwell (1998), but it still suffers 
especially from lack of an appropriate morphological model, 
and a failure to appreciate the role of Chamic lexical influence 
in the history of South Bahnaric (and Bahnaric more 
generally). 

Jacq & Sidwell (2000) 

Recognizing the severe problems caused by failure to 
appreciate the West Bahnaric languages in Sidwell (1998), 
Jacq and Sidwell in 1998 began several years of joint fieldwork 
in the Lao PDR, which ultimately contributed to 2 important 
projects, a reconstruction of Proto-West-Bahnaric and a 
descriptive grammar of Loven/Jruq (the latter as an MA Thesis 
at the Australian National University Jacq completed in 2001). 

A Comparative West Bahnaric Dictionary (Jacq & Sidwell 
2000) is the first of two attempts by these collaborators to 
produce a comprehensive Proto-West-Bahnaric reconstruction. 
1023 West-Bahnaric etymologies and reconstructions are 
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presented. Regrettably the results of the project were 
compromised by several factors: the monosyllabic root theory 
was applied leading to a misanalysis of initial clusters, an 
overly phonemic approach was applied which oversimplified 
the proto-vocalism, and a defective lexicostatistical matrix 
used in the discussion of classification. Fortunately these issues 
were addressed in the follow-up study by Sidwell and Jacq 
(2003). 

Theraphan L-Thongkum (2001) 

Chapters Five and Six of this substantial study deal with the 
Bahnaric family, and offer reconstructions for Proto-Bahnaric, 
Proto-West-Bahnaric and Proto-North-West-Bahnaric. 

Chapter Five discusses Bahnaric internal classification at 
length, including both lexical and phonological criteria, and 
divides the family into five branches: South-, Central-, North- 
West-, North- and West-Bahnaric. Phonological inventories 
and morphological systems of West- and North-West-Bahnaric 
languages are presented and discussed. The basis of the 
phonological reconstructions is briefly discussed, although 
tabled correspondences are only given for the diphthongs 
(principally to deal with the defective treatment of diphthongs 
in Jacq & Sidwell 2000). 

Chapter Six begins with a list of 262 Proto-Bahnaric 
reconstructions, followed by presentation of approximately 
2,300 West- and North-West-Bahnaric etymologies and 
reconstructions. As with the earlier chapters on Katuic in the 
same volume, the data are organized semantically. The 
reconstructions lack explicit correspondences justifying their 
formulations, but the primary data itself is of the highest 
quality and extremely useful. 

The most problematic aspect of the study is the 
reconstruction of a North-West-Bahnaric sub-group proposed 
to consist of Kasseng, Tariang, Yaeh and Alak. Theraphan's 
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theory has received some support, and Diffloth (2005) included 
the North-West-Bahnaric branch in his most recent 
classification (for a review of Bahnaric classification see 
Sidwell 2002). 

In my alternative analysis North-West-Bahnaric is not a 
genetic grouping as such. Rather, Kasseng (a Lao designation 
lumping various peoples of the Upper Sekong Valley) and 
Tariang are one language, properly called Tariang, which may 
be classified as West-Central-Bahnaric. Yaeh is a dialect of Jeh 
(North Bahnaric) spoken in Laos, and finally Alak forms a 
separate branch that I call North-Central-Bahnaric. 

Sidwell & Jacq (2003) 

A Handbook of Comparative Bahnaric: Volume 1, West 
Bahnaric (Pacific Linguistics 2003) reflects new fieldwork 
data obtained and analyzed in order to address the problems of 
Jacq & Sidwell (2000). Much of the new analysis followed on 
from wide ranging discussions with relevant scholars, 
especially Michel Ferlus. 

The study compiles 1094 etymologies and reconstructions. 
The phonological history of the sub-group is more or less 
completely worked out, and includes a discussion of the effects 
of areal contact with Katuic, which manifests as traces of a 
glottalization feature in the phonology. The internal 
classification of the West Bahnaric languages is determined on 
the basis of comparative phonology. This generates a model of 
the West Bahnaric homeland location, and the path of splits 
and migrations leading to the present distribution of daughter 
languages. The full extent of the West Bahnaric languages is 
established, with maps and demographic data given for each. 
The book is intended to be the first of a short series on the 
Bahnaric family culminating in a reconstruction of Proto- 
Bahnaric to appropriately supersede the model offered by 
Sidwell (1998). 
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6.2 Katuic 

The Katuic languages have been subject of several articles and 
book length reconstructions since the 1960s. We can now say 
that the history of the group is quite well understood, and the 
proto-lexicon is extensively documented. 

Thomas (1967/1976) 

The first attempt at a Katuic comparative reconstruction is 
Dorothy (Dot 1 ) Thomas's master's thesis, submitted to 
University of North Dakota in August 1967, and published as a 
Summer Institute of Linguistics Workpaper in 1976. I was 
fortunate to obtain a duplicate of the original carbon copies 
from the David Thomas Library, Bangkok. 

Thomas characterizes the work as a reconstruction of Proto- 
East-Katuic. Three languages are compared: Bru, Pacoh and 
Katu (note that a contemporary description of Katuic may list 
more than 20 named languages). These are treated as East- 
Katuic in a default classification, in contrast to a western sub¬ 
branch consisting of Kui (as its diverse dialects). Nowadays we 
would treat Bru as sub-grouping with Kui, so that technically 
the languages Thomas used are sufficient to reconstruct Proto 
Katuic. Using then-unpublished wordlists, Thomas assembled 
667 etymologies. Comparisons with non-Katuic languages 
were not made. 

Thomas' phonological reconstruction was rather complex, 
positing four series of initial stops, 22 monophthongs and 14 
diphthongs. The analysis of vocalism effectively treated the 
Bru system as conservative, and derived the Pacoh and Katu 
vowels by various shifts and mergers. Vowel registers were not 
analyzed, but merely left as an open question. Despite 
subsequent work by other researchers, Thomas' reconstruction 
remained influential well into the 1990s. It significantly 
informed Peiros (1996) and was used by Thurgood (1999) in 
his analysis of loans in Chamic. The analysis of the initials was 
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not taken up by later researchers, as it has both typological and 
etymological difficulties. External comparisons show that 
Thomas' aspirated series correspond to EMK plain voiced 
stops, while her voiced and imploded series both reflect a 
single EMK imploded series. 

In retrospect Thomas’ main error lay in neglecting the 
analysis of registers, so that she did not connect this important 
aspect of the vocalism with the historical voicing structure of 
the consonants. This oversight can be forgiven, however, if we 
recognize that at the time, philological analysis of Mon and 
Khmer was not yet widely discussed or understood among 
Southeast Asian linguists whose primary focus was fieldwork 
(and brings all the more credit to Ferlus, below). 

Ferlus (1974) 

This paper discusses the Katuic language Ong, analyzing its 
historical phonology, and offering a model for the development 
of its glottalized registers. Although the Ong registers are 
structurally different from the Mon and Khmer types, Ferlus 
attempts to relate them by extending the general model of 
register formation derived from the philological tradition. Here 
we see the beginning of an important subsequent characteristic 
of the research of Ferlus: applying insights from the history of 
written languages to the interpretation of field data of minor 
languages. 

Diffloth (1982) 

Diffloth's 1982 essay is an excellent analysis of the problems 
of Katuic historical phonology. It specifically focuses on the 
different paths of register development vis-a-vis West Katuic 
and Pacoh, and offers a summary Proto Katuic reconstruction 
supported by 138 numbered etymologies. 

The paper is not an attempt at a comprehensive account of 
the history of the family. Rather it gives two main findings. On 
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the one hand, the phonetic history of Kui and Bru parallels that 
of Khmer in many respects, with a 'normal' path of devoicing, 
register formation, and vowel splitting. On the other hand, 
Pacoh is a "registrogenese heretique", the problem being that 
one cannot connect its registers with consonant devoicing. 
Diffloth's solution to the Pacoh issue is both elegant and 
insightful, correlating register with various vowel shifts and the 
consequent phonemisization of voice quality features related to 
vowel height/openness. 

Beyond Pacoh, the great contribution of Diffloth's paper to 
theory is to clearly establish that there is no single universal 
process underlying the formation of vowel registers in MK 
languages. Concretely he sketched out the phonemic system of 
Proto Katuic, which is only moderately affected by later work; 

I would offer a slightly different analysis of phonetic values of 
some proto-vowels, and the number of proto-vowels involved 
(see Sidwell 2005). 

Efimov (1983) 

Efimov's Kandidat thesis (in Russian) was completed in 
Moscow; recently Alexsander Kassian kindly sent me a scan of 
the carbon copy held in the library of the Far Eastern Institute. 

Efimov worked at the same time as Diffloth and 
consequently did not have the benefit of the latter's essay. 
Efimov's thesis makes a sincere attempt at a comprehensive 
account of Katuic historical phonology, and is a remarkably 
good piece of work given the limited data to which he had 
access. Efimov worked with five criterion languages: Katu, 
Suoei, Bru, Pacoh and Kui. Only Katu is represented by an 
extensive source—Costello's 1971 multi-dialectal dictionary. 
The other languages were represented by fragmentary sources, 
mainly taken from journal articles. In this context Efimov 
assembled 407 etymologies, and presented a reconstruction of 
Proto-Katuic vocalism and consonantism. 
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Given his data set, the results he presents are very 
reasonable. Efimov correctly identified the analytical errors in 
Thomas' 1967 thesis, showing that three series of initial stops 
aie adequate to account for shifts in the consonantism, and he 
offered a model of vocalism that respectably takes Katu as the 
historical model. Some of the credit for Efimov's good results 
rests on his very appropriate use of several papers by Ferlus 
(specifically Ferlus 1971, 1974, 1977). 

Gainey (1985) 

Jeiry Gainey wrote his MA thesis, A Comparative Study of Kid, 
Brim and So Phonology from a Genetic Point of View, at 
Chulalongkorn University under the supervision of Theraphan 
L-Thongkum. The thesis ostensibly investigates the genetic 
relations between the three West Katuic languages, but 
achieves much more than this. Gainey's methodology involves 
the identification of shared phonological innovations, so there 
is much attention to working out details of the phonetic 
evolution of West Katuic. The study is crowned with a 
comparative lexicon of 570 items between Kui, Bru and So. 

For whatever reason Gainey declined to take the next 
logical step and formally present his work as a reconstruction 
of Proto-West-Katuic. Nevertheless, this is essentially the 
resultant effect of this under-appreciated thesis. 

Shorto (no date, manuscript) 

When the late Harry Shorto was preparing his Mon-Khmer 
Comparative Dictionary> he used data from only one Katuic 
language, Kui. He was at first under the impression that Kui 
shares a close, even sisterly, genetic relation with Khmer, and 
was quite unaware of the larger Katuic family. Consequently 
he compiled a 50 page draft Khmer-Kui historical phonology, 
with 176 lexical comparisons, and discussion of phonological 
correspondences. 10 
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Around 1981 Shorto obtained the recently published (1980) 
Bru dictionary of See Puengpa and Theraphan L-Thongkum, 
and began to rework his theories, developing the idea of a 
"Bruan" (= West-Katuic) subgroup and reconstruction. An 
undated manuscript that came into my possession in 2005 is a 
52-page draft of a detailed comparative analysis. The historical 
phonology describes the development of Kui and Bru as 
register languages from a non-register proto-Bruan, and the 
lexical comparisons are indexed to the relevant Proto-MK 
reconstructions throughout. 

After completing his draft Proto-Bruan, Shorto obtained 
two more important Katuic dictionaries: the Pacoh dictionary 
of Watson et al. (1979), and Nancy Costello's (1971) multi- 
dialectal Katu dictionary. Shorto redrafted his corpus of 
comparisons to include the Pacoh and Katu data, creating a 
total of 794 Katuic etymologies and reconstructions. The 
historical phonology was revised, especially in respect of the 
proto-vocalism. The pages on which the correspondences were 
recompiled have not been found, although there is a passing 
reference to them in the existing notes. 

In effect Shorto drafted a monograph-length Proto-Katuic 
reconstruction, with special emphasis on the development of 
West Katuic ("Bruan"). His results and conclusions effectively 
parallel those reached independently by Diffloth (1982) of 
which Shorto was apparently unaware. It is truly a great pity 
that this study did not appear in print when it was completed in 
the mid 1980s. 

Peiros (1996) 

Ilia Peiros moved from Moscow to Melbourne in the early 
1990's, and began lecturing in historical linguistics at the 
University of Melbourne (including mentoring the present 
writer). To produce his own Proto-Katuic reconstruction Peiros 
took four principle sources of Katuic data: the dictionaries of 
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Kui (Sriwises 1978), Bru (See Puengpa and Theraphan L- 
Thongkum 1980), Pacoh (Watson 1979) and Katu (Costello 
1971). Methodically comparing every entry of each source 
pair-wise with the others, he produced an exhaustive 
comparative lexicon of 1241 putative cognate sets. 

Unfortunately, a series of errors undermines his work. 
Peiros misunderstands the register distinction in Bru and Kui, 
incorrectly describing it as being between "breathy" and "lax" 
voice, rather than between "breathy" and "clear". He wrongly 
analyzes the notational conventions of the sources for final 
consonant -jh, -j?, -wh, -w2, taking them as indicating -h, -2 
suffixes in final clusters, apparently harking back to Schmidt 
(1906). Under the apparent influence of Schmidt (1904), Peiros 
interpreted all phonological minor syllables as prefixes, thus 
incorrectly analyzing the entire lexicon as having underlying 
monosyllabic roots. And Diffloth (1982) is mischaracterized 
and misunderstood. 

Problems continue in the phonological reconstruction. 
Peiros correlates breathy vowels and voiced proto-initials, but 
goes too far by reconstructing a voiced/voiceless opposition 
among all the initial sonorants, reconstructing a cumbersome 
and typologically marked consonant inventory. In respect of 
vowels Peiros implements an elaborated version of the 
approach of Thomas (1967), positing a Bru-like vocalism for 
the proto-language consisting of 15 simple vowels and 22 
diphthongs. No word-final velar stop is reconstructed, instead 
all modern examples are derived from a glottal stop. 

Theraphan L-Thongkum (2001) 

Comparative Katuic research took a great leap forward at the 
turn of the century with the appearance of Theraphan's 
substantial volume. Reporting on field research conducted in 
remote areas of Sekong Province (Lao PDR), Theraphan 
extended our knowledge of the Katuic family with a substantial 
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comparative lexicon that included original data for Kantu, 
Ta'Oi, Kriang, Chatong, Triw and Dakkang. The latter three 
languages, perhaps more appropriately characterized as Katu 
dialects, had not previously been documented in print. There is 
also a chapter on Bahnaric (discussed above). 

The main text is in Thai, but data are given in IPA 
transcription with English glosses and are readily accessible to 
a wide audience. Data are presented in 1406 comparisons, 
reflecting more than 1300 distinct etymologies, organized 
semantically into a thesaurus, but lacking a lexical index. The 
presentation includes a comparative Katuic reconstruction and 
discussion of classification (but see below). Proto forms are 
offered for each entry, and each is labeled (PK, WK, NEK, 
CEK and/or SEK) according to the subgroup it reflects in 
Theraphan's classification. 

The reconstruction is problematic because it is not 
explicitly justified in the text or in tabular correspondences. As 
data are presented in semantic groups rather than by 
phonological order, correspondences for each proto-phoneme 
must be extracted and compiled individually. In my own 
phoneme-by-phoneme survey, I was unable to infer a 
consistent formal analysis for the reconstruction as a whole. 
Publication of this would greatly enhance Theraphan's work; 
but even without it, the book makes a substantia] amount of 
new and reliable data available, and is an excellent lexical 
resource. 

Sid well (2005) 

Sidwell presents 1395 Katuic etymologies, a reconstruction of 
the historical phonology and a phonologically motivated 
classification of the languages. The results are in principle 
consistent with Diffloth (1982), but are more extensive by an 
order of magnitude. Data for some 16 named languages are 
utilized, drawing upon all published dictionaries, SIL wordlists. 
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published reconstructions and other sources that could be 
obtained at the time (including my own fieldwork in Laos). 

The major lacunae of the work, also consistent with 
Diffloth (1982), is that a complete explanation is not offered 
for the phenomenon of glottalized rhymes attested in some 
Ta'Oi dialects (but discussed in Diffloth 1989 and Ferlus 
1974). This phenomenon still awaits a proper account. 

6.3. Pearic 

Although Pearic was one of the first MK groups to be 
documented, the languages are highly endangered and 
thoroughly affected by contact with Khmer and Thai. In both 
principle and practice it is difficult to do comparative 
teconstiuction on this grouping. Nonetheless a preliminary 
reconstruction has been offered, and some recent descriptive 
and linguistic salvage work has been pursued at Mahidol 
University. Useful MA dissertations from Mahidol include 
Siripen Ungsitipoonporn (2001), which contains an extensive 
comparative lexicon of two Chong dialects with more than 
1400 entries, and Noppawan Thongkham (2003), which has a 
comparative lexicon of Chong and Kasong; it presents 281 
numbered entries but a significant number of lacunae. 

Headley (1978) 

Robert Headley produced a preliminary Proto-Pearic 
reconstruction and internal genetic classification. He relied 
mostly upon the Baradat vocabulary of 1941 and a manuscript 
Chong vocabulary provided by Franklin Huffman. From the 
perspective of comparative phonology the reconstruction is 
rather incomplete. Unfortunately, his sources did not reliably 
distinguish the four registers, so Headley simply noted the 
phenomenon of "prefinal glottals" and decided to leave the 
question to luture linguists". The result is a compilation of 
410 etymologies and reconstructions. It is expected that the 
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addition of more recent field data collected by Thai linguists 
will encourage more fruitful efforts at reconstruction. 

6.4. Monic 

The Monic branch essentially consists of just two languages: 
Mon and Nyahkur. This limits the extent to which comparative 
reconstruction can be applied to investigate the history of the 
branch. Fortunately, Mon has a long written history that dates 
back some 1500 years. It has received much attention from 
philological/epigraphic studies, the most important of these 
being Shorto's (1971) A Dictionary of the Mon Inscriptions 
from the sixth to the sixteenth centuries. One attempt at a purely 
comparative reconstruction of Monic has been made, but the 
most successful efforts have combined a range of 
methodologies. 

Ferlus (1983) 

In this 90-page paper Ferlus uses both comparative and 
philological methods to reconstruct the phonetic history of 
Mon and offer a phonological reconstruction of Proto-Mon (in 
his formulation the stage before written Old Mon). 

Particular attention is paid to the evidence of loan words 
from Thai, Khmer, Burmese and Indie to inform the phonetic 
interpretation of Old Mon inscriptions. The analysis is 
illustrated with extensive explicit tables showing the 
development of Mon rhymes leading to modern forms. 
Methodologically the paper elaborates the principals employed 
by Ferlus (1975) in respect of Vietic and reflected in his work 
on other MK branches. 

Regrettably the paper does not offer results in the form of a 
consolidated comparative lexicon or index of reconstructions, 
which users would certainly welcome. Instead the reader must 
extract these from the hundreds of examples distributed 
throughout the discussion. 
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Methodologically Ferlus perfects a modern version of the 
approach pioneered so masterfully by Henri Maspero in his 
(1912) analysis of the history of Vietnamese (which used 
dialect and literary evidence to conclude that the language 
originated from an ancient mixing of Tai and MK elements, 
later overlain with Chinese). Thus we see that Ferlus' work 
derives from a well developed and respected French tradition 
that draws upon various classes of evidence to inform the 
analysis. 

Diffloth (1984) 

While prepared at approximately the same time as Ferlus' 
equivalent study, Diffloth's monograph uses a completely 
different approach to historical reconstruction. Developing 
upon the approach seen amongst the SIL tendency, Diffloth 
explicitly privileges the evidence of data from contemporary 
fieldwork for comparative reconstruction over and above 
evidence from historical written sources. On the basis of the 
extensive dialect data Diffloth offers some 680 etymologies, 
and provides reconstructions stratified into Proto-Mon, Proto- 
Nyahkur and their common ancestor Proto-Monic. 

The book makes a bold case against philological methods. 
To illustrate this Diffloth gives the example of Dvaravati Old 
Mon <sran> "silver", which would seem to unambiguously 
indicate a Proto-Monic form *sraji. However, since no 
expected modern Nyahkur form [chrep] is found in the data, 
the principled decision is taken to not reconstruct *srajr 
"silver". In effect Diffloth claims that (then) contemporaneous 
written evidence is not acceptable in the absence of 
confirmatory fieldwork data recorded some 1000 or 1500 years 
removed from the fact. A different but related argument 
favoring purely comparative methods is advanced in Diffloth 
(1980), discussed below (§6.7). 
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6.5. Khmeric 

The Khmeric branch is represented by a single language, 
Khmer, which shows three main dialects: Standard Khmer, 
Northern or Surin Khmer, and Western or Cardamom Khmer. 
Apparently all the known dialects derive from Khmer of the 
Angkorian period or later, and thus are younger than the oldest 
written sources. Therefore strictly comparative studies based 
on the dialects alone would not yield a reconstruction of Proto- 
Khmer, the ancestor of Old Khmer. 

Existing dictionaries of Old Khmer include Long Seam 
(2000), Pou (1992, revised 2004), and Jenner (1980-86); a new 
dictionary by Jenner is in preparation. Studies that have 
investigated the history of Khmer by analysis of the epigraphic 
record include Ferlus (1992), which also extensively uses 
evidence of loan words to reconstruct the phonetic history of 
the language. 

6.6. Khmuic 

The Khmuic branch is primarily represented by the vast Khmu' 
dialect chain that spreads across Northern Laos. Another ten or 
so member languages, all quite different from each other, are 
located around the periphery of the Khmu' area. The Khmu' 
dialects are now well documented (e.g. Suwilai Premsrirat 
2002) but published sources in regard to the other Khmuic 
languages remain mixed. So far there has been no attempt at a 
consolidated Khmuic lexicon and Proto-Khmuic reconstruction, 
although some limited historical studies do exist. A 
bibliography of Khmuic studies is given by Proschan (1996). 

Filbeck (1978) 

Filbeck's 1971 PhD dissertation, published by Pacific 
Linguistics in 1978, is a comparative study of the Mai and Pray 
dialects known collectively as T'in. The Proto-T'in 
phonological system is reconstructed on the basis of a modest 
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but well-chosen comparative lexicon. The discussion includes 
consideration of the role of ancient loans from Thai, and there 
is significant discussion of the classification of Tin, 
incorporating tables of lexical comparisons with Khmu, 
Palaungic, Khmer and Mon. 

Egerod (1984) 

In a paper presented to the 17th International Conference of 
Sino-Tibetan Languages and Linguistics Spren Egerod 
presented a brief comparative study of Mlabri with 
comparisons establishing phonological correspondences with 
Kammu, Lamet, Mai, Proto-Palaung, Proto-Waic and Muong. 

Unchalee Singnoi (1988) 

This unpublished MA thesis at Mahidol University, A 
Comparative Study of Pray and Mai Phonology develops much 
of the work in Filbeck (1978), including presentation of an 
extensive Mal-Pray lexicon (approx. 1250 entries). 

6.7. Palaungic 

The Palaungic (or Palaung-Wa) branch is widely scattered 
geographically, and is commensurately very diverse internally. 
With more than a century of accumulated data the family is 
now quite well known. Since the 1980s in particular various 
grammars and dictionaries have been produced by Chinese 
scholars. Various sub-groups have been subject to important 
comparative-historical studies and Gerard Diffloth is known to 
be been working on a much-anticipated consolidated Proto- 
Palaungic. Proschan (1996) includes a select bibliography of 
Palaungic studies. 

Shafer (1952) 

The first of Robert Shafer's two Etudes sur I’austroasien deals 
specifically with Palaungic. Building directly on the 
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framework established by Schmidt (1904), Shafer sets forth 
more than a hundred comparisons for 11 Palaungic tongues, 
establishing regular correspondences for the consonants and 
vowels. The results include a modest lexicon of 88 Proto- 
Palaungic items. A substantial part of the paper (25 pages) is 
devoted to discussing lexical parallels with Sino-Tibetan, Thai, 
and Khasi. 

Shorto (1963) 

The paper gives a brief account of word-formational patterns, 
particularly prefixation, in Palaung and related languages. 
Although not offering historical reconstructions as such, this 
study is immensely informative for comparative linguistics, yet 
this type of work has been somewhat neglected, and needs to 
be fostered. 

Luce (1965) 

In a paper ostensibly concerning Danaw, Luce provides a list 
of 245 lexical comparisons between Mon, Danaw, Riang, 
Palaung and Wa, plus an appendix of further MK and wider 
comparisons. It is apparent that this paper and related notes 
(such as preserved in the Luce archive at 
http://archives.sealang.net/luce ) subsequently informed Shorto 
(1971, 2006 etc.) 

Mitani (1977) 

Mitani's paper treats 283 comparisons between Palaung, Rumai, 
Ra-ang and Darang, so it is effectively restricted to the 
Palaung-Riang sub-group. Phonological correspondences for 
consonants and vowels are established and proto-values 
reconstructed. A proto-lexicon is not offered, but one could 
readily derive the forms implied by the phonological formulae 
offered in the paper. 
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Mitani (1979) 

In a follow-up to Mitani (1977), Riang data is worked into the 
analysis to further confirm Mitani's reconstruction of Proto- 
Ralaung-Riang. 

Diffloth (1980) 

Filling an entire issue of Linguistics of the Tibeto-Burman Area , 
this monograph-length study is to date the most important 
published contribution to comparative-historical studies of the 
Palaungic languages. Diffloth marshaled all of the then- 
available sources for languages of the Waic sub-branch (6 
main sources, plus 28 lexicons of varying quality) to propose a 
comprehensive reconstruction of Proto-Waic. 544 etymologies 
are compiled and organized into a hierarchy of Proto-Wa-Lawa 
(PWL) and Proto-Waic (PW). This reflects the division of 
Waic into two sub-branches, Wa-Lawa and Samtao. Where Wa 
and Lawa reflexes are present a PWL form is reconstructed. 
Where additionally there is a reflex found in the Samtao sub¬ 
branch a PW form is also reconstructed. No distinct Proto- 
Samtao reconstructions are offered. 

Generally the study is a model of comparative 
reconstruction, and Diffloth takes the opportunity to offer it as 
an exemplar of his preferred methodology. The study opens 
with a the following explicit statement of Diffloth's 
programmatic perspective: 

Among the fourteen or so extant branches of the Mon- 
Khmer family, only three or four have developed and 
preserved enough differentiation today to yield proto¬ 
branch reconstructions of great antiquity. They are: the 
Bahnaric, the Aslian, the Palaungic and probably the Viet- 
Muong branches. It is mostly from these reconstructions 
that we will be able some day to cast a glance at Proto- 
Mon-Khmer and beyond. The Katuic, Khmuic and 
Nicobarese branches, while extremely useful, do not 
appear to be as diversified as the first four; Monic and 
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Khmeric, in spite of their written records and resulting 
prestige, ironically rank lower in this respect, (p.3) 

Svantesson (1988) 

This descriptive and historical paper goes into great detail 
analyzing and explaining the odd phonetic history of U, with 
extensive comparisons to other Palaungic data (especially with 
Hu and Lamet). 

Paulsen (1989/1991) 

Originally an MA thesis (1989), this study was subsequently 
published as a long paper in Mon-Khmer Studies (1991). It 
nicely complements Diffloth (1980), since it is effectively a 
reconstruction of the Samtao sub-branch of Waic, which lacks 
a distinct reconstruction in Diffloth's treatment. The results 
include a clear historical phonology, and an etymological 
lexicon of 556 entries and reconstructions, comparing Plang, 
Shinman and Pangloh, drawn from the author's own field data. 

Curiously the study only mentions Diffloth (1980), but 
appears not to make any use of it. Thus the Plang data and 
etymologies are analyzed without reference to already known 
Waic context, and consequently a number of Thai and other 
loans are erroneously treated as Plang. 

Diffloth (1992) 

A short follow-up to Diffloth (1980), this paper discusses data 
from the recently surveyed Bulang sub-branch of Waic, with 
reference to his earlier Waic reconstruction. It is a pity that 
Paulsen's (1989/1991) related results are not discussed. 

6.8. Aslian 

The Aslian branch, with a score or more distinct languages, is 
comparable to Katuic in its internal diversity, and may thus 
make a significant contribution to Proto-MK reconstruction. 
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Exciting early progress at the beginning of the 20th century 
was followed by a long lull. The Malayan Emergency (1948- 
1960) meant that circumstances were not ripe for a revival of 
fieldwork until the 1960s; then a surge of effort and some 
comparative studies appeared. 

More recent fieldwork initiatives are now helping to 
complete our knowledge of the family (e.g. Kruspe 2004 on 
Semelai, and now working on Ceq Wong and Mah Meri; 
Buienhult 2005 on Jahai, and now working on various North 
and Central Aslian languages). Consequently the volume and 
quality of published and unpublished Aslian lexicon that could 
be marshaled by an effective collaborative effort would go a 
long way towards facilitating a comprehensive Proto-Aslian 
reconstruction. 

A selective bibliography of Aslian studies is provided by 
Bishop & Peterson (1994), and there is long descriptive article 
on the Aslian branch by Matisoff (2003). 

Diffloth (1968) 

This short paper marks the rebirth of comparative Aslian 
studies in the second half of the 20th century. A modest 
amount of data from four Semai dialects (collected in the field 
by the author) is presented in support of regular 
correspondences that lead to a partial reconstruction of Proto- 
Semai phonology. 

Benjamin (1976) 

This is a major essay that discusses the classification and 
disposition of the Aslian family. The remote history of the 
Aslians is reconstructed to some extent with reference to 
migrations, contact and borrowings. The Skeat & Blagden 
materials are discussed and a very useful concordance of 
language names with contemporary equivalents is presented. A 
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basic vocabulary of 146 terms for 22 named languages crowns 
the paper. 

Diffloth (1975) 

In this 19-page paper the classification of Aslian into three sub¬ 
branches, originally proposed on lexical grounds, is further 
confirmed by historical phonology. With a minimal number of 
illustrative etymologies, outlines of the phonological histories 
of the Northern, Central and Southern sub-branches are 
sketched. 

Diffloth (1977) 

This paper is the better-developed successor to Diffloth (1968). 
it is based on comprehensive data collection over many years 
in the Sernai speaking communities. 284 numbered 
etymologies illustrate phonological correspondences and a very 
detailed historical phonology. The paper is copiously 
illustrated with diagrammatic explanations of the historical 
vowel changes, including a large unified, flow chart of the 
chronology of vowel innovations. The author mentions having 
collected an Aslian lexicon in excess of 26,000 items; it is 
regrettable that this resource remains unpublished/inaccessible. 

Diffloth (1979) 

This short paper puts the results of Diffloth (1977) into broader 
MK perspective, and includes a very useful comparison, 
focused on pronominal systems, between Aslian and other MK 
languages. 

Phillips (2005) 

Within these unpublished but widely circulated 115 pages, 
Phillips reports on extensive survey of Semai dialects, and 
elaborates Diffloth's (1977) reconstructions with several 
hundred new reconstructions. 
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6.9. Vietic 

The Vietic branch, and in particular the Viet-Muong" sub¬ 
branch, has attracted considerable attention in the form of 
comparative research, especially since the breakthrough studies 
by Haudricourt in the 1950s. With the increasing availability of 
field data on minor Vietic languages since the 1970s, a better 
understanding of the extent and typology of the branch has 
emerged. Draft consolidated Proto-Vietic reconstructions are 
now circulating among specialists. Barker (1993) provides a 
bibliography of Vietic studies. 

Cuisinier (1951) 

In a paper documenting Muong ritual texts, Cuisinier presents 
considerable dialect data, and sets forth phonological 
correspondences between Vietnamese and Muong, although 
without attempting reconstruction. 

Haudricourt (1952,1953,1954) 

As discussed elsewhere in this paper, these ground-breaking 
papers established the solid MK ancestry of Vietnamese, and 
set the agenda that has guided the reconstruction of Vietic 
linguistic history over since. Crucial to the argumentation was 
the use of comparative data from Khmu' that permitted the 
reconstruction of finals *-?, *-h and *-s, supporting the new 
model of tonogenesis. 

Barker (1963) 

This modest paper (nine pages) is the first in a series by Milton 
Barker (and also later Muriel Barker) contributing to the 
reconstruction of Proto-Viet-Muong. The Barker's had done 
field work on Muong dialects in the early 1960s, stimulating 
their interest in to the relation with Vietnamese. This paper 
discusses the reconstruction of initial labial consonants. 
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Barker (1966) 

A large amount of data is packed into this paper's 16 pages. 
Some 600 Vietnamese-Muong lexical comparisons are given, 
illustrating the tonal correspondences between Vietnamese and 
Muong. The paper demonstrates that their systems share a 
common origin, with the five-tone Muong system readily 
derived from the six tones preserved in Northern Vietnamese. 

Barker & Barker (1970) 

On the basis of 210 illustrative comparisons between 
Vietnamese and Muong the Proto-Viet-Muong finals and 
vowels are reconstructed. Some ambiguity is unresolved 
concerning the phonetic values of several vowel phonemes, 
reflecting the fact that the paper does not take the analysis 
further by invoking wider Vietic comparisons. 

Ferlus (1975,1982,1991,1992a, 1996,1997,1998, 2001, 2005) 

In a series of related papers since the early 1980s Ferlus has 
continued to develop his reconstruction of Vietic history. He 
progressively takes into account more data from Minor Vietic 
languages, and develops a comprehensive theory concerning 
the early influence of Chinese, and the emergence and nature 
of tones and registers in Vietic. 

In the first of these papers (1975) Ferlus sets forth a 
comprehensive account of the phonological history of Viet- 
Muong within the broader MK context, given then-available 
data. Each aspect of the historical phonology is dealt with in a 
systematic way, invoking wider MK comparisons, and, for the 
first time, data from other minor Vietic languages that were 
just starting to be documented (especially Sach, Thavung and 
some Pakatan). The interpretation of philological data, 
especially de Rodes' Middle Vietnamese dictionary, also plays 
an important role. We also see the very beginnings of Ferlus' 
spirantization theory that would subsequently be so influential 
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in his comparative reconstruction. Typical of Ferlus' style, the 
paper does not contain a consolidated comparative vocabulary, 
but features scores of illustrative items throughout the text. 

The subsequent papers build on and improve the model, 
especially as more data from minor Vietic languages is 
integrated. Since 1991 Ferlus has been circulating a 
consolidated Proto-Vietic lexicon in various drafts, and is now 
preparing a "final" version with over 3000 entries. The latter is 
keenly anticipated for the fundamental contribution it is 
expected to make to deeper MK reconstruction. 

Thompson (1976) 

This substantial paper attempts to reconstruct Proto-Viet- 
Muong as extensively as the available data would permit. 
Thompson provides a consolidated comparative vocabulary of 
approximately 700 entries, with Vietnamese and three Muong 
dialects. Reconstructed forms are offered for about 600 of 
these. There are 49 pages of discussion and analysis, which are 
based entirely upon comparative reconstruction. 

Huffman (1977) 

In this paper Huffman revisits the thorny issue of the place of 
Vietnamese within MK, and sets forth a comparative basic 
vocabulary with illustrative entries from most branches of MK. 

Sokolov skaia (1978) 

This very important paper presents a reconstruction of Proto- 
Vietic, utilizing a total of 'll sources. They include numerous 
Muong dialects, minor Vietic languages such as Thavung, 
Pakatan and Sach, and Vietnamese dialects. The bulk of the 
text is found in its etymological lexicon of approximately 650 
entries. The methodology is entirely comparative, and wider 
MK data are not discussed. I understand that Sokolovskaia 
subsequently produced an extensively revised and expanded 
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treatment, the manuscript of which was offered to Mon-Khmer 
Studies for publication in the 1980s. Hov/ever, Sokolovskaia 
passed away, and the only hard copy of her manuscript was 
misplaced and has not been seen since. 

6.10. Nicobaric, Khasic 

Unfortunately there is no effective tradition of published 
comparative Nicobaric or Khasic studies of which I am aware, 
beyond the inclusion of various lexical comparisons in works 
of wider scope such as Shorto (2006). There are two 
unpublished preliminary Proto-Nicobarese reconstructions 
cited in various bibliographies (apparently listing a couple 
hundred comparisons), although I have seen neither: Ziide and 
Dwarikesh (1962), Critchfield (1963). 

6.11. Proto-Mon-Khmer 

It was quite an extraordinary disappointment that in the years 
which immediately followed the descriptive and comparative 
effervescence of the 1960s and 70s, no major work appeared 
that surveyed, consolidated and analyzed the newly available 
field data and reconstructions, relating them to the frameworks 
sketched out by Schmidt, Shafer, Pinnow etc. Lacking such an 
authoritative canonical reference work and the programmatic 
impetus it would have provided, the field drifted and stagnated. 

None the less some individuals did pursue their own 
compilations and analyses. In this respect the highest 
expectations fell upon Gerard Diffloth. He ceased publishing 
branch level reconstructions after 1984, although did not halt 
his comparative research. Since the late 1970s he has been 
actively compiling a consolidated MK comparative lexicon and 
reconstruction of Proto-MK. Many of his notes from the time 
still exist and can be seen at the Cornell University Library. By 
1980 the project was already so well developed that he applied 
for and received National Science Foundation funding for A 
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Mon-Khmer Etymological Lexicon. The application included a 
draft chapter reconstructing the Proto-MK faunal lexicon, 
indicating that even then the project was certainly at a stage 
that would have been useful and welcomed by the wider 
research community. A decade later a related project was also 
funded {Khmer (Cambodian) Etymological Lexicon , NSF 1988- 
1990 and NEH 1989-1991), predicated on the claim that a 
sufficient MK etymological lexicon had already be compiled 
and analyzed to inform etymological dictionaries of individual 
languages. 

In Moscow through the 1980s Ilia Peiros was also 
preparing his own MK comparative lexicon and reconstruction. 
In Peiros (1998) he describes the methods he used, and 
mentions a set of 1500+ etymologies compiled on ledger cards. 
More than 100 of the reconstructions formed part of his 1989 
paper in which he seeks to link SEAsian language families 
with the Nostratic and Sino-Tibetan reconstructions of Sergei 
Starostin. Recently Peiros has made his comparative data and 
results available on-line through the Tower of Babel project site 
(http://starling.rinet.ru/) , including some 2,246 Proto- 
Austroasiatic reconstructions (by my latest count). It is very 
welcome to have this material available on-line, although the 
underlying analysis is difficult to assess. There is no supporting 
documentation justifying the reconstruction, such as statements 
of correspondences or distributions, or other explanatory text. 
Generally the approach has been to compile as extensive a list 
as possible based on a few criterion languages/sub-groups 
(especially Written Khmer), such that most sets invoke data 
from only one, two or three MK branches. The same site also 
provides comparative data and reconstructions for various MK 
branches, although the bulk of these appears to be reproduced 
from publications other authors. 

Both Diffloth and Peiros consistently shy away from 
philological methods, preferring to use the most reliable and 
recent field data they can obtain. By contrast, Harry Shorto 
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pursued his comparative MIC project solidly on the basis of 
more traditional methods which especially privilege epigraphic 
data. He believed that written evidence that dates back 1,000- 
1,500 years—and is thus a much closer witness to the proto¬ 
language—is inherently preferable over modern forms, which 
are that much further removed from the mother tongue (Shorto 
1965 elaborates on the theory and method of using epigraphic 
data). On that basis Shorto went ahead to produce his own 
version of the canonical reference that he hoped would do 
justice to the field, but which unfortunately would have to wait 
30 years after the 2nd ICAAL and 12 years after his own 
passing to appear in print. 

Shorto(2006) 

Shorto's passion was Mon philology. Having taken up the torch 
from Blagden and Luce after accepting a lectureship at SOAS 
in 1948, Shorto devoted himself to the study of this Southeast 
Asian language with the longest and most complete written 
tradition. He published his A Dictionary of Modern Spoken 
Mon in 1962, and followed this up with A Dictionary of the 
Mon Inscriptions in 1971. The latter included the extensive 
etymological commentary (largely drawn from Schmidt 1905, 
Skeat & Blagden 1906 and Gordon Luce's notes), which he 
subsequently reworked to form the basis of his MKCD, the 
results of which are summarized below. 

The proto-MK consonants presented by Shorto are shown 
below. They match the table offered by Diffloth in his 1974 
Encyclopaedia Britannica article (and elsewhere), reflecting a 
more or less consistent view that can be traced back to the 
foundation laid by Schmidt: 
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*/ 


P 

b 

5 

m 

w 

s 


t 

d 

cf 

n 

rl 


c ' 


y 


k 

g 


Additional proto-segments *t2, *d2, *n2 were added by Shorto 
in his second draft of the MKCD. The first two of these are 
posited (in most cases) to account for the correspondence of 
preconsonantal 5 in Northern MK and Munda with t in other 
MK languages (e.g. *t 2 r/ii? 'sun, day’ on the basis of forms 
such as Palaung soyi and Mundari siygi versus Khmer thyay 
and Old Mon tney).*n 2 is suggested to explain a 
correspondence of prevocalic laterals in Northern MK and 
nasals elsewhere (e.g. *bn 2 tes 'spear' on the basis of such forms 
as Riang-Lang pies and Old Mon bnos). 

Shorto reconstructs PMK initial consonant clusters based 
upon the following relation between the registers of Mon and 
Khmer: 


PMK 

voiceless + voiceless 
voiceless + voiced 
voiced + voiceless 
voiced + voiced 


Mon 

head register 
chest register 
head register 
chest register 


Khmer 
head register 
head register 
chest register 
chest register 


This cluster reconstruction appears to hold up fairly robustly, 
although necessarily it will not recover features that have been 
lost due to parallel developments in the two criterion languages. 
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The proto-vowel inventory suggested by the MKCD is as 


follows: 







*/ 

i 
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ii 

uu 



e 

3 

0 

ee 

33 

00 


a 

3 



aa 

3D 





is 

[uis] 

U3 






ai 

/ 

Shorto 

tables 

the 

correspondences 

justifying this 


reconstruction (excepting the bracketed item) in Table 1 of Part 
1 of the MKCD. The system is strictly MK in the sense that it 
is derived by comparison of just those two languages. The 
resulting reconstruction is generalized to the family as a whole, 
invoking various vowel alternations to account for irregular or 
unexpected correspondences. 

Shorto's lexical reconstruction yields 2,246 groups of 
proto-MK lexemes, plus several hundred Palaungic and South 
Bahnairic etymologies (in appendices) for which provisional 
PMK antecedents are posited. In most cases the PMK forms 
are justified by the presence of reflexes in at least Mon and 
Khmer, or one MK language and a convincing wider 
comparison (in Shorto's estimation) with Munda and/or 
Austronesian. 

Although not spelt out explicitly in the surviving notes, the 
method Shorto employed in compiling more than 2,500 
etymologies and 30,000 lexical citations is more or less clear. 
It began with extracting the nearly a thousand comparisons 
first presented by Schmidt in his various foundational studies. 
To these were added additional data from other comparative 
studies, specifically Skeat and Blagden (1906), Schafer (1965), 
Barker (1963, 1966), Blood (1966) and Smith (1972). Reliance 
on the latter two sources accounting for the fact that some 146 
items in his PMK lexicon are posited with Bahnaric reflexes 
alone. 
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The framework thus established was further fleshed out 
with data from published lexicons and dictionaries, and some 
unpublished field data contributed by colleagues, especially 
from individuals with whom he had direct contact thanks to the 
ICAAL meetings in Hawai'i and Mysore. However, personal 
factors acted to limit the cooperative sharing of data to Shorto's 
disadvantage. According to his daughter Anna (pers. com.), 
Harry was reluctant to have dealings with "missionary types". 
Consequently he did not seek to cultivate personal contacts 
with SIL researchers, and even declined to submit work to 
Mon-Khmer Studies. Instead he worked more and more alone. 

In total the MKCD presents some 30,000 lexical citations 
from numerous languages, in a well organized and accessible 
fashion, making it a marvelous resource for comparative 
studies. For a detailed constructive critical review of the 
reconstruction see Sidwell (2006). 

6.12. Concluding remarks of history of comparative 
reconstruction 

In reviewing the last half century or so of comparative MK 
linguistics, we discern a striking pattern that might be crudely 
characterized as the dichotomy of philology versus field work. 
In its extreme realizations, we see the approach that Shorto 
took in reconstructing Proto-MK counterposed to the strictly 
'contemporary data only' approach of Diffloth (1980, 1984 etc.), 
which explicitly downgrades the data that Shorto most relied 
upon. 

Shorto's MKCD is the ultimate test case of the first 
approach, and it does have many evident weaknesses. The very 
method that made the reconstruction possible—the reliance on 
philological/epigraphic data—also limited the scope and 
accuracy of the results. Yet it allowed Shorto to draw a broadly 
accurate picture whose flaws could be readily understood, and 
which may be corrected sooner rather than later. 
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Diffloth, in contrast, has been the primary proponent of the 
non-philological approaches that have been relatively dominant 
since the 1960s. Although his published work has been limited 
to branch-level reconstructions, Diffloth still has the 
opportunity to prove his larger case, if and when he publishes 
his own long-awaited reconstruction of proto MK. 

But as may be most appropriate for the study of Southeast 
Asia, the middle road is also being successfully navigated. The 
great exemplar in this respect is Ferlus, whose work takes into 
account a wide range of field data collections and analyses, yet 
is also profoundly influenced by his deep knowledge of 
classical languages and written traditions. His distinguished 
analyses of the history of Mon (1983), Khmer (1992b) and 
Vietic (1975, 1982, 2001 etc.) have been based upon a 
readiness to apply all types of available evidence, weigh them 
accordingly, and apply them to solving the problems at hand. 

The conclusion to be drawn from reviewing the past 
century of work on the MK languages is surely that no single 
methodology or resource (and certainly no individual 
researcher!) is sufficient to reconstruct the history of this 
extraordinarily ancient and diverse language family. But by the 
same token, no method, resource, or researcher should ever be 
seen as being marginal enough to be safely ignored. Again and 
again it is the combination of method, resources, and individual 
drive that ultimately leads to lasting results. 

7. Where to Now? 

We have now reached a stage in the course of comparative MK 
linguistics that is remarkably favorable. The extent of the 
family is now fairly well known; there are reliable and 
extensive lexical data available for all branches of the family, 
and there are unlikely to be a score of new MK languages still 
to be found, let alone any new branches. The most diverse 
branches- Bahnaric, Katuic, Palaungic, Aslian, Vietic- have 
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already been the subject of comparative studies of varying 
extents (discussed above), and comprehensive reconstructions 
are either in train or at least already technically feasible. Thus 
the program briefly sketched out by Diffloth (1980:3) is not 
only practical, but with a serious effort could be effectively 
implemented over the next few years. 

However, I do not suggest that we follow Diffloth's 
prescriptions to the letter and pay less regard to the evidence of 
smaller branches, especially Mon and Khmer. Shorto 
unambiguously demonstrates the usefulness and relevance 
(although arguably not the primacy) of these well documented 
and historically analyzed languages for comparative 
reconstruction. It thus seems self-evidently reasonable that the 
insights achieved by Shorto (2006), and Ferlus (especially 
1983, 1992), should be appropriately integrated with the results 
of comparative studies dealing with the most diverse MK 
branches. 

In addition we should look to utilize Khmu' and Pearic 
lexicon in reconstruction. Despite the fact that extensive new 
data is required to recover Proto-Khmuic, the multi-dialectal 
Khmu' lexicon of Suwilai Premsrirat (2002) represents a 
substantial reliable source that can be applied directly to the 
reconstruction of PMK. Regarding Pearic, Suwilai is presently 
consolidating a comparative lexicon based on two decades of 
field work conducted from Mahidol University. Hopefully this 
will usefully inform a Proto-Pearic reconstruction that will 
rework and enhance the framework established by Headley 
(1985). Although the Pearic data are inherently limited due to 
their endangered status, there are indications that Pearic may 
be especially important to understanding PMK phonology (if 
not lexicon 12 ). As Diffloth (1989) discusses, there is the 
unresolved issue of whether to reconstruct a voice quality 
distinction, perhaps a tense/creaky register, in PMK. The 
relevant branches are Vietic, Katuic and Pearic. It is imperative 
that we investigate the historical origins of the tense/creaky 
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registers in these branches, determining as far as the data will 
allow the extent to which they have arisen secondarily, or must 
be treated as archaic. 

Another crucial aspect of Pearic concerns its ambiguous 
classification within MK. There are no clear indications of 
Pearic sub-grouping with any other branch. Therefore, Pearic 
potentially reflects the highest branching node in the tree 
(below Munda) and may thus be indispensable to 
reconstructing PMK. By contrast, if Nicobaric sub-groups with 
Monic, and Khasic sub-group with Palaungic, these two 
branches are arguably less important. Their data may be better 
dealt with latter, after attention to the larger branches has 
further informed our understanding of the past and refined our 
recognition of the gaps. Currently available published data for 
these two branches is rather limited: 

• Khasic: there are dictionaries of Standard Khasi (e.g. 
Singh 1906, Bars 1973), and contemporary descriptive 
studies such as Nagaraja (1985), but we need extensive 
dialect data 13 to facilitate a Khasic reconstruction (I am 
sure that more exists than I am aware of), 

* Nicobaric: there are some older dictionaries (e.g. Man 
1888/1889, Roepstroff 1884, Whitehead 1925), and 
descriptive analyses (e.g. Radhakrishnan 1981), but 
more descriptive work is urgently required, especially 
since the recent devastating tsunami. 

Given that the technical conditions for substantial progress 
in comparative reconstruction are largely satisfied, the 
remaining variable to consider is the human factor. The sheer 
scope of the challenge of dealing with a family of 150 or so 
languages is such that no one can or should try to do 
everything; collaboration is ultimately essential. This has been 
trie weakest link in our field over the last 30 years, 
characterized by a lack of dedicated conferences, joint 
publications, and the lack of a clearly articulated programmatic 
vision around which efforts can be coordinated and motivated. 
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Can we work together to accelerate the pace of progress? 
Yes, by pursuing any strategy that promotes collaboration 
between researchers, including sharing data, cooperating in 
field work, hosting regular conferences and publications, and 
committing to open discussion of the methodological and 
programmatic aspects of our work. 

With an appropriate collaborative effort I am sure that we 
can see the emergence of a consolidated Proto-MK 
reconstruction and comprehensive etymological dictionary in 
years, rather than in decades. It is my earnest wish that the 
revival of these ICAAL meetings we are participating in today 
will contribute concretely to the realization of such a project. 

NOTES 

1. The author gratefully acknowledges the support of the Mon- 
Khmer Languages Project by the National Endowment for 
the Humanities. Any views, findings, conclusions, or 
recommendations expressed in this publication do not 
necessarily reflect those of the NEH. Thanks are also due to 
Doug Cooper and Tom Sidwell for their helpful comments 
on earlier drafts of this paper. 

2. The 1651 Vietnamese-Portuguese-Latin dictionary of de 
Rhodes being an especially early example. 

3. For example, Peiros (1996) was influenced heavily by 
Shmidt's ideas in his unsuccessful morphological analysis of 
Katuic, and under his influence the present writer's South 
Bahnaric (Sidwell 1999) and West-Bahnaric (Jacq & Sidwell 
2000) reconstructions followed suit, 

4. That paper was written specifically to refute Dyen's (1965) 
classification that suggested a Micronesian homeland for 
Austronesian. By implication it also refuted the idea, 
expressed by Shorto from time to time, of an Indo-Chinese 
homeland, which would have reconciled more easily with the 
Austric hypothesis. 

Not listed in Thomas' bibliography. 


5. 
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6. Not listed in Thomas' bibliography. 

7. It is likely that those index cards were eventually entrusted to 
Harry Shorto, but were lost when SOAS cleaned out his 
office after his retirement in the 1980s. 

8. The Luce Papers are now held in the manuscripts collection 
of the Australian National Library. The Mon-Khmer and 
Sino-Tibetan lexical materials have been scanned by CRCL 
and indexed, and are available on-line at 
http://archives.sealang.net/luce . 

9. Palaungic may be comparable in diversity if, as appears 
likely, the Mangic/Pakanic languages of Vietnam and China 
belong to the Waic sub-group (Nguyen Van Lo'i, pers. com. 
8/10/07). 

10. Electronic copies of the unpublished papers of Harry Shorto 
mentioned in this paper can be obtained directly from Paul 
Sid well (paulsidwell@yahoo.com) . 

11. Perhaps better called "Muong-Viet", to eliminate confusion 
with "Vietic". 

12. The Pearic languages are all well into late stages of 
endangerment, such that it is not possible to collect extensive 
lexicons unaffected by contact languages, restricting their 
usefulness for lexical reconstruction. 

13. In this conference a paper was presented which explained the 
ongoing ambitious project on Khasi Dialects Survey, 
conducted by Central Institute of Indian Languages, Mysore. 
The first draft of the first stage is shortly awaited. [Eds]. 
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The Mon-Khmer Languages Project is a broad plan 
to support research in comparative linguistics and 
lexicography. The Project provides two linked, Web- 
accessible resources with the usual array of search 

•ir'irl nfaoontotmn 

C11114 V--OWiilUllVJi i luL/u. 


The Mon-Khmer languages database is an on¬ 
line store of lexicographic data. Drawn from both 
published and unpublished sources, the database will 
ultimately provide a snapshot of relevant (for 
comparative purposes) knowledge of each of the 
Mon-Khmer languages, including glossing and 
phonetic transcription. The Mon-Khmer etymology 
database serves a similar role for analysis. It will 
initially be based on data extracted from Shorto’s 
Mon-Khmer Comparative Dictionary (2006); the 


mnct pYtpnciyp cnoh 


rpQnurop 9 nr 


d fifTmcr Qt^rtincr 

w *** v **. 0 --— *o 


point for this effort, 


The MKL Project is intended to be both 
accessible and extensible. ‘Source filtering’ lets 
resource sets be defined as narrowly or broadly as 
desired; for example, searches might include only 
data from a particular dictionary, or incorporate all 

i j ' i 

data available for a given language. However they 
are defined, resource sets can also be extracted and 
downloaded for off-line research. New datasets that 


1 CRCL and the Mon-Khmer Languages Project gratefully acknowledge the support of the 
National Endowment for the Humanities. Any views, findings, conclusions, or 
recommendations expressed in this publication do not necessarily reflect those of the NEH. 





106 


Dough Cooper 


follows a simple XML tagging protocol can also be 
added to the MKL Project databases. 

The Mon-Khmer Languages Project is a broad plan to support 
research in comparative linguistics and lexicography. It was 
created to provide a practical means for sharing lexicographic 
data and comparative analysis, including both confirmed and 
edited results, and the ‘dark matter’ of working data and partial 
results that are, in some cases, our only available resources. 
The Project provides two linked, Web-accessible resources 
with the usual array of search and presentation tools: 

• The Mon-Khmer languages database is an on-line store of 
lexicographic data. Drawn from both published and 
unpublished sources, the database will ultimately provide a 
snapshot of relevant (for comparative purposes) knowledge of 
each of the Mon-Khmer languages, including glossing and 
phonetic transcription. 

• The Mon-Khmer etymology database serves a similar role 
for analysis. It will initially be based on data extracted from 
Shorto’s Mon-Khmer Comparative Dictionary (2006); the most 
extensive such resource, and a fitting starting point for this 
effort. 

The MKL Project is intended to be both accessible and 
extensible. ‘Source filtering’ lets resource sets be defined as 
narrowly or broadly as desired; for example, searches might 
include only data from a particular dictionary, or incorporate 
all data available for a given language. However they are 
defined, resource sets can also be extracted and downloaded 
for off-line research. New datasets that follows a simple XML 
tagging protocol can also be added to the MKL Project 
databases. Every item is identified by its contributor’s name, 
so the obvious issue of quality control is dealt with in a 
transparent, elegant manner: source filtering can include, or 
just as readily exclude, any individual’s contributions. Thus, 
only sources the user trusts, or items that been vetted by 
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scholars the user trusts, will actually figure in any response to 
user queries. 

The Mon-Khmer Languages Project is, above all, a 
collaborative venture. We have received wide support in the 
linguistics community in planning and acquiring initial data for 
the project, and generous funding from the U.S. National 
Endowment for the Humanities in launching it as of May, 
2007. I look forward to describing the project’s 
implementation, and to soliciting advice and comment on how 
it can best meet its goal of enabling timely sharing of data and 
analysis by Mon-Khmer language researchers. 

Please note that this discussion paper describes a preliminary 
system that is still in the initial stages of development and has 
not yet been publicly released. Both unpublished data and 
partial subsets of published data are being used for illustrative 
purposes. The author takes full responsibility for any 
misrepresentation or error. 

1. Introduction 

The Mon-Khmer language family is the larger subgroup of 
the Austroasiatic stock; the Munda languages, spoken 
primarily on the Indian subcontinent, form the other. The 
roughly 150 Mon-Khmer languages are of great antiquity, 
major linguistic interest, and primary importance for the study 
of Southeast Asian history and culture. Mon-Khmer languages 
are the national languages of Vietnam and Cambodia, and are 
found in communities large and small in India and China, and 
across broad swaths of Burma, Malaysia, Laos, and Thailand. 

As might be expected, the historical depth and geographical 
diversity that make the Mon-Khmer languages so central for 
linguists, historians, archeologists, and other academic 
specialties have also made it extraordinarily difficult to gather 
the broad set of lexicographic resources required for detailed 
comparative work. Data has been gathered for more than a 
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century, but not all of it has been formally published, and none 
of it has been available in electronic form. 

The Mon-Khmer Language Project was developed in order 
to address this issue: to collect and digitize the widest possible 
range of lexicographic and comparative resources. First 
announced in 2004, the project received initial two-year 
funding from the U.S. National Endowment for the Humanities 
in 2007, following our (Sidwell, Cooper, and Bauer) editing 
and publication of Shorto’s Mon-Khmer Comparative 
Dictionaiy (Shorto 2006). It has received broad support from 
the field, with pledges of data and assistance corning from 
around the world. 

The project has three focal points. It will create: 

a Mon-Khmer languages database that makes all language 
reference materials, including phonetic transcription, glosses, 
and citations, freely available. We anticipate compiling initial 
datasets representing all Mon-Khmer branches in the first two 
years. 

a Mon-Khmer etymological database that provides an on-line 
hierarchical reference that puts language data in context. It will 
be based on - and ultimately extend greatly - Shorto (2006). 

a collaborative worksite for Mon-Khmer language research, that 
provides an architecture for extension, comment, and correction 
of language and etymological data. 

This paper describes the operation of the databases, and 
begins to document the mechanisms that will be provided for 
data sharing in the Mon-Khmer Languages Project. 

For the first two years of the project we will treat this as a 
preliminary specification, which may be modified in response 
to the needs of the linguistics community. Of necessity, the 
overview provided here will be supplemented by more detailed 
documentation of technical issues, including in particular 
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project standards for etymological markup, and the design and 
implementation of phonological search. 

We begin section 2 with a brief survey of relevant 
literature, then introduce some terminology, and discuss 
certain design considerations that impact on making data 
available as rapidly as possible. We describe the language and 
etymology databases from the user perspective in sections 3 
and 4. Section 5 discusses how data may be accessed and 
redistributed, while section 6 introduces the underlying coding 
of database contents. Section 7 deals with additions to the 
database. Finally, section 8 gives an overview of current and 
planned database contents. 

2. Preliminaries 

We open by briefly citing relevant research and references, 
then discuss initial design considerations. 

2.1. Previous Work 

The only general reference to Mon-Khmer etymology is 
Shorto’s ambitious Mon-Khmer Comparative Dictionary.> 
(Shorto 2006). Prior to this, only branch and sub-branch level 
reconstructions had been attempted; these include North 
Bahnaric (Smith 1972), Mnong (Blood 1966), East-Katuic 
(Thomas 1967), Viet-Muong (Barker 1963, Barker & Barker 
1970), Jeh-Halang (Thomas & Smith 1967), Semai (Diffloth 
1977), Waic (Diffloth 1980), Monic (Diffloth 1984), South 
Bahnaric (Sidweil 1998), Katuic (Sidwell 2005), and Vietic 
(Ferlus ms.). 

Etymological projects of regional interest that have some 
digital component include three initiated in the mid-1980’s: the 
Sino-Tibetan Etymological Dictionary and Thesaurus (STEDT, 
1987), the Austronesian Comparative Dictionary (ACD, 1990), 
and the Munda Lexical Archive (MLA, initially funded by NSF 
for Sora in 1979, completed 1985); access points for the latter 
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two are noted below. Earlier works that have since been 
digitized (although purely as text references under the auspices 
of the Digital Dictionaries of South Asia project) include 
extensive analyses of Dravidian (Burrow and Emeneau 1964) 
and Indo-Aryan (Turner 1966-69). Work in the Tai family is 
not available in any digital form, and includes Li Fang Kuei’s 
proposed reconstruction of proto-Tai (1977) and Ostapirat’s 
reconstruction of proto-Kra (Ostapirat 2000). 

Handling of etymological data for computer database 
applications has received relatively little interest. Crist (2005) 
provides a good survey of the topic, and of the minimal 
consideration for etymological markup given by existing 
systems. His proposed model includes the definition of 
hierarchically tagged cognate sets with explicit specification of 
word values and etymon/reflex relations, and uses accepted-by 
and rejected-by tags as a way of approaching the problem of 
encoding confidence levels. Other useful references include 
Good & Sprouse (2000), Bell & Bird (2000), Ide et al (2000), 
and Wittenburg et al (2002). 

In recent years consideration of preservation and reuse of 
data have begun to occupy an increasingly important role in 
project planning. The Electronic Metastructures for 
Endangered Languages Data (E-MELD, emeld.org) has been 
extremely influential in promoting basic ‘best practices;’ see 
also Bird & Simons (2003). The Open Language Archives 
Community (OLAC, www.language-archives.org) and the 
Open Archives Initiative (http:// www. openarchives. org) 
focus on metadata harvesting. Other efforts include the 
Documentation of Endangered Languages project (DoBeS, 
http:// www. mpi. nl/ DOBES), and the Pacific and Regional 
Archive for Digital Sources in Endangered Cultures 
(PARADISEC, http://paradisec. org.au). 

On-line resources for comparative linguistics are rare, and 
include the Tower of Babel etymological database project 
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(http://starling.rinet.ru/main.htinl), and the Indo-European 
Etymological Dictionary> (IEED, http://www . indo-european.nl) 
at Leiden University. Others mainly provide parallel 
lexicographic resources, and include the Austronesian Basic 
Vocabulary Database (http:// language, psy. auckland. 
ac.nz/austronesian), the Intercontinental Dictionary Series 
(IDS, http:// ling web, eva . mpg. de/ids), and the Munda Lexical 
Archive ( http://www. ling , hawaii.edu/faculty/stampe/aa.html). 

2.2. Author and Item Identification 

As we will see below, the Mon-Khmer Languages database 
differs from typical approaches to storing and redistributing 
lexical data. Rather than clustering data into groups that 
essentially mirror the layout of paper dictionaries (sometimes 
referred to as the editorial view), we instead reduce the entire 
dataset to three basic elements: 

items are citations or reconstructions, and may include 
orthography, phonemic rendering, glosses, and other lexical 
information; 

links encode the relationship between items; e.g. they point 
citations to reconstructions; 

notes may comment on items, links, or other notes. 

The resultant structure is highly suited to representing the 
genealogical trees associated with etymological data, yet 
lexical data is readily reused simply by adding or ignoring 
linking data. The unique identification of every element is an 
essential requirement. We define two necessary terms: 

authored : all sources are identified by an abbreviation that 
follows the form XxxYYYY: 

Xxx is a three-letter abbreviation of the author’s name, and is 
not case sensitive; 

YYYY is the year of publication. Preliminary materia! that is 
intended for discussion but not citation is dated YxxY, e.g. 2xx7. 
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itemID : individual items are identified within the project 
database, every data item, including citations, reconstructions, 
links, and notes, is uniquely identified using this basic format: 

author 1D:X:number , 

X gives the itemID type and is one of the letters Citation, 
Reconstruction , Link , or Note (discussed below). 

number is usually the entry number in the original source. If the 
original author has an identifying letter or number (or 
combination) it is used instead. In both cases, the 
subcomponents of an entry are numbered sequentially, e.g. 17-2 
or V215-1. 

For example, Sinl906:C:24 is the twenty-fourth citation in U. 
Nissor Singh’s Khasi-English dictionary (Singh 1906), while 
Sho2006:R:24 is the twenty-fourth reconstruction set in 
Shorto’s comparative dictionary of Mon-Khmer. 

2.3. Other Preliminary Design Issues 

A central consideration has been to make the data collected 
under the auspices of the project available to the broader 
community as rapidly as possible. A certain degree of tension 
results from balancing the desire for immediate access with the 
long-term goals of providing a retrospective survey of existing 
data, and incorporating it into a fully documented comparative 
analysis. Points of temporary compromise include: 

normalizing data Although we use Unicode for character 
encoding, there is still inconsistency in the conventions used for 
both orthographic and phonemic data. For the moment, we 
preserve original formats. Note that data may still be exposed 
by searching via the original author or publication even if 
orthographic or phonemic data do not adequately conform to 
search tools designed for the project as a whole. 

partial etymological grouping Lexicographic (as opposed to 
truly comparative) sources, as well as preliminary field notes, 
do not necessarily posit higher-order grouping of data into 
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etymological sets. Where possible, we insert partial information 
to help group individual data items; e.g. adding dummy root 
entries as described in section 6.3.1. Of necessity such additions 
are incomplete; in the long run the dummy entries will be 
replaced by pointers to the Mon-Khmer etymological tree. 

selective acquisition In some datasets a very large number of 
citations from different dialects or sources within a single 
language are provided. In some cases we have selected 
representative items in order to populate the database, and will 
return later to complete data collection. 


3. The Mon-Khmer Languages Database 


The languages database is a collection of published and 
unpublished texts, comprised of ordinary dictionaries as well 
as etymological and comparative works. All data in the 
languages database is also accessible via the etymology 
database; however, the languages database is more suitable for 
viewing or extracting datasets, as opposed to searching across 
datasets. The basic user interface is shown below: controls are 
on the left, and results are on the right. 


O Mon-Khmer Lang...& © Mon-Khmer Compar. 

Mcn-Khmer Languages Database 

irtawi isc/r. tc'sc-ay. iorreme; 


by language 
by branch 


enhagraphy 

.gloss 

'./I0 


| Reset | 

JSON 


Dictionaries & Lexicons by Author 
Banker, John: Banker, Elizabeth; Siu Mo. 


Oifl98Q:R:?i =?e? we. incltishv. more than two (.proto Wale) 

Dif1980:R:?i.A -?e? {uv. inclusive, more than pvo> (proto WaLawa) 

Difl 33C.C.71* t i, c i iv* ; tiu-ln^ive, more triu/i pVj (Diageji 

Dif : '^80:C:?1-2 ?e? fiue. tuciustw, more thiui iW'a) (Lav. a [Bo Luang)) 
DIM96 \ ?£? }we. mciiistvc. more them caw} (Samtau; 


Dif 198Q:R;?2 "k?5? bamboo (.proto Waicj 

DiH980:R:?2.A. *k7j? {bamboo} (proto WaL&wa) 
Drf1980:C:?2-1 o {bamboo} (VVa [Diage),) 
Dif198Q:C:?2-2 ?t>? {bamboo} (Lawa [Bo Luang]) 


1970 Qahnar Dictionary: Plei Bong - Mang Yang 

Dialect Summer institute of Linguis'les Cal^oriva \vi 
202 p •• show 

O Bahnar £235 items 

Satina? iGoiar) 1993 items 
G Bahr.ar (Konluiti) £46 items 
O Bahnar iPlelku ! 3740 items 

Diffloth. Gerard 

i960 The Wa languages Linavist-ci otv .ie 
Two-Busman Area 5.2 (192 pp w cite - read .' show* 

Q La.va 4f-& items 
Cl Samtau 35/ items 
Q VVa £G5 'terns 
G proto vvatawa 542 items 
proto -Vau: 544 .lems 


Drt1980:R:73 = r?o7 to crow vproto Wale) 

DK19B0:R:?3.A -r2o? {toCrow} (proto Walavea) 
Dif1980:C:?3-1 ro {ro crow} (Vfa (Dragej) 
Dif1980:C:73-2 ra?o7 fiocrotr) (Lav/a [Papae]) 
Dlf1980:C:?3-3 ra?6? }tO C>vw} (SailUau) 


Di(l980;R:?4 "?«? ( (proto Walej 
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Dtfl98Q:R:76.A -ka? {fish} [proto WaLawa) 
Dif1980:C:76-t ka(B) {fish} (Wa (Drags).; 
Drf1683 C 76-2 ka? {fish} (Lawa [Bo Luafi?]') 
D»f1980 C:?€-3 k67 (Sajntau) 


In this case, we have chosen the show link from the entry 
for Diffloth (1980); this reformats and displays all Diffloth 
1980 entries currently in the database (which is still 
incomplete). Other links provide formal citations for each text 
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in a variety of formats, as well as PDF or DjVu files of the 
original published source. 


© Mon-Khmer Compar... O Mon-Khmer Lang... 
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The contents of the languages database may be listed by 
author or source (as seen above left), or by language and 
dialect (above right). For convenience, all dialects cited by a 
particular author are both aggregated and individually 
selectable. Above right, note that Kui has three major sets (one 
listed as “Kuy”) consisting of 640, 618, and 658 items 
respectively. The second set draws on four sources, which can 
be inspected separately. Checking any set of boxes (such as the 
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Kui and Kuy group) lets us print a merged lexicon, as seen 
below sorted by IPA: 


C Kuan, all 1 items (Sho2006) 
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0 Kuy, all 658 items jStio2006) 

bruu 

WIlKEJt 

Kui Hafl571:G5Q4-7 

□ Lahu, all 1 items (Sho2006) 

bru: 

Ji 

KUV Sho20yc:C:lS2-l 

□ Lanoh, all 6 items (Siio2006) 

buaj 

itik bi/k/km 

Kui [SR] SifflOiGSUM 

D lanoh {Jengieng), 2 items ($tio2Q06) 

baa? 

■o pee! 

Kuv Sb2CC«G3-l<-S 

□ lanoh (Yir) 2 items (Sho2006) 

bua? 

ute 

KUV 5-ol’C S6:C:3c5i 1 

u Lao, all 14 items (Sho2006) 

bub 

I’wn cremate 

Kui {SR} SibtCCSrCiloS-! 

□ Laven, all 704 items (Huf1971j 

buh 

tote 

Kuy SbmC:20IH 






The languages database can be used both for on-line 
browsing and for extracting and downloading (possibly 
merged) datasets. Data may be returned in four formats: 


HTML is best for on-screen viewing, or for re-use in Web pages. 
The text is tagged using standard HTML tables; a CSS stylesheet 
is embedded in the page. 

XML returns data marked with the tagset used internally and 
described in section 5.1. 

Text returns data as tab-separated values, without tagging. 

JSON returns data in JavaScript Object Notation, and is intended 
to be called programmatically in order to be incorporated into 
Web pages (not yet available). 

Data may be sorted by IPA, orthography (if available), gloss 
(ignoring leading punctuation), or internal ID number. 

4. Mon-Khmer Etymological Database 

The etymological database fulfills a dream long held by 
Southeast Asian linguists - a resource that can begin to help 
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unravel the tangled web of language influence and 
development in the region. 

The basic user interface is shown below. Controls are on 
the left and bottom, while results are returned to the upper 
right of the screen. We see the results of a search for the word 
rat in any gloss. Results are show in sets, ordered from the 
earliest reconstruction to modern-day reflexes. 


1.0 Mon-Khmer © Mon-Khmer Lan... • w Bloom filter - W... ^ http://s-.Jlter.pm $ Bloom Filters-t... ; 1 

Mon-Khm«r Con;paralive Dictionary 

SEARCH 1 '' (Normal!;Exact) 

$ho20Q6:R:93 rat, mouse {.Proto Mon-Khmer) 

Sho2006:R;93.A *kiii[i]? rat. mouse (Proto Moil-Khmer [A]) 

IPA 

x? - 0/1 x. x* “ x. xx, xxx .... x* « x * any 

Orthography 

DiM984:R:Ni *k»ii? {rap mouseret) (pioto Monic} 

Oifl984:R:Nl.B *khn]i? { rar} (proto Nyah Kur) 

Dif1984:C:N1-1 ktianu? rut (Nyah Kur [Genual]) 

iat Text 

any varty v : 10 

Rsslndtoonelanguage v U rese! j 

Search: 

lidrscontsJritetons? bil citations 

Sho2006:R:93 rat, mouse (Proto Mon-Khmer) 

Sho2D06:R:93,A *knj[1]? rai, mouse (Proto Mon-Khmer [A]) 

Difi984:R:N1 '"knii? {rat mouse // rat} (proto Monic) 

Dif1984:R:Nl.A Tklhnpi? {rat mouse} ( pioto Mon) 

Difi 984: C:N 1-2 noe? rat, mouse (Mon[RaoJi 

Return: Lay out ! sort by: 

■tfrgeon to reflex 

G reflex tc reccn ' PA 

. Uutl entry oioss 

.'Sine-hern only t '-language 

Dif1984 : R:V21 6 *kilr {// to dig (e.g. a hole ): to dig ( in a material, e.g. earth): to dig ( e.g. a 
tuber, a bamboo rat) out {Transitive Verb}} (pioto Monic) 

Dift984:R:V215.B ’‘kur (re dig (e.g. a hole): re dig (in a material, e..g. earth); to dig (e.g. 
a tuber, a bamboo rat) out [Transitive Verb]} (proto Nyah Kur) 

Difi984:C:V2t 5-1 ciir rci dig (e.g. a hole); to dig (in a material, e g. earth); to digfe.g. 
a tuber, a bamboo ret) out [Transitive. Verb) (Nyah Kur [Central]) 

: pa 1 ; restrict j Choose v»&tr ipa cnatacierpic 

ter or restrict ambers/languages 

1 Click Ctil'GCt to IPA (&bcv6} Gt s' bU/h l Ihsi: die* save- 1 c: j disc-jiaall S 

SfreotfSK B«KS; LaSS**. a^rijB? 

^ . ...... ... *** i ; i a .a u : 

N8»4V'; ! mm n> n; n sj 

A K V /. 0 6 N ¥ : v • u ; 

as&-v«va t p b 6 ; t d d 

f d - * if ! v 9 s ?.° :... : • ' c c ; o un « 

p r t" 

rricaii.r £* $ f : 6 S J • 

: ct : ; ,E , . ■ , . 
s C x x n ft ; ■ a? 

Acsro/.tv o xv tv v{v) y r i 

1 i ; j£v) j : i| si ; s J ; i:,. ■' * 0 ! ° | 

j Tscies a* cen-iEf iix>m iPA tut 'e£ec' tep.-cai Scst&isi As-aa prsc;c* /«ve? ayc-f a txn/arts f’jSj/n t Laa-jar, pp 2*5 21*2 | 


A variety of other options are supplied. For example, set 
ordering can be reversed- and displayed from reflex to 
reconstruction by setting an alternative return option: 


Dif1984:C:N1-1 khsnii? rat (Nyah Kur [Central]) 

Dif1984:R:N1.B *khnii? {rat} (proto Nyah Kur) 

Dif1984:R:N1 *knii? {rat, mouse // rat> (proto Monic) 

Sho2006:R:93.A *kiij[i]2 rat, mouse (Proto Mon-Khmer [A]) 
Sho2006:R:93 rat, mouse (Proto Mon-Khmer) 


The lower part of the screen shows an innovative 
mechanism for constructing phonemic searches. Both the 
consonant and IPA tables are similar to the standard IPA 
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charts, but are intended to be more useful for Southeast Asian 
practice. Items may be selected individually or in sets. For 
example, clicking on “S+” in the Fricative box at the lower left 
returns the set 6sfggx c jf ch , which may then be used as part of 
a search target. Similarly, the U? set in the vowel panel returns 
the set ?adiu, which is designed to specify an optional 
unstressed vowel. 


An alternative lower panel is shown below. It provides the 
option of either including or excluding particular authors and 
sources, and may be used to provide language restrictions as 
well. Note in particular the Exclude only ... choice Ml but 
ancestors, which lets searches be confined to a particular 
subset of modern languages or intermediate reconstructions, 
while still allowing linked citations to further up the Mon- 
Khmer tree. 


i ip a 1 ! iesaici 1 Choose view: ipa character picker or restrict authors/languages 


Search only 

Include alt data 
Comparative 
Shorto MKCD [Sho20S€i 
t DiolhWaicpnSSISt 
Dioih Monic rDll’934I 
F arte Proto Viefc IFer2»;Tj 
Sidwali Kaluic (SkEOIKi 
Languages 

I Banker Bahnar (BsnISTS) 
f Huffman 2C MK(Huf1971; 

Man Micobar (ManlSS9) 
j Singh Kf!33!iSinlS06l 


Exclude only... 

: Don't exclude anybody 
Ail but ancestors [unless ’full') 
Comparative 
Shoifo MKCO (3ho2006) 
CSfflothVV^c 
Difflolh Monici’Oif’934} 
Feiius ProtoViatic (Fe(2a?| 
Srcwelf Kasuic (Sid20D4) 
Languages 
Bank«r8ainai^3AlS79) 
Huffman 20 MK {Hufl^/l) 
Man Nicobar (Man18S3) 

I' S.ngh Khasi (SinlSCS 


Language restriction 

No language Rsslsdton 
RECONSTRUCTIONS 
Proto Kam-Sui n terns) 

Proto Karen (Homs) 

■ I nro ‘0 Kahiir V<3S5 «tems| 
i PrGto Miao-Yao (1 items! 
preb Monic po items) 

Prole- Mon-Khmer (6107 items) 
proto Mon (623 items) 
i ProloNyah fen (61 terns) 
proto Nyah Kur {642 items) 
i ? rolo Pong (760 items) 

;■ Proto Sernai (43 Items) 
proto VieSc (1236 items) 
proto Ware [544 items) 


Reanalysis... 

Allow non-confiicfing additions 
Allow al! additions 
: Allow original aulhots only 
1 Allow aulhor-approved 
Allow additions by. 

: Cooper (Coo20C7 
Exclude additions by... 
Cooper i 1 .50233 ri 


The search only, exclude only, and reanalysis controls 
enable one of the project’s fundamental design goals: that it be 
open to all user contributions, subject to the limitations of good 
taste and common sense. Such contributions may include both 
sets of lexical data - citations or reconstructions - and analysis 
of the historical relations between existing items. 
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In effect, the database is rebuilt every time it is consulted, 
and is thus able to take into account any additional links, 
reconstructions, or commentary. The same mechanism that 
allows this flexibility also helps ensure that database users can 
control just whose contributions are reliable enough to be 
incorporated into the data set. Every relation (and every 
reconstruction, comment, and citation) is marked with the 
identity of its contributor. The user can choose to include 
and/or exclude any subset of analysis (and data) providers. 

5. Access to Data 

A key design goal of the Mon-Khmer Languages Project is to 
expose project data both interactively (via the user interfaces 
just seen), and programmatically , using the kind of application 
programming interface known as a Web API. The Web API 
reveals all of the functionality of the user interface to program¬ 
generated queries, and or to URIs that may be embedded in 
texts (and are discussed below). 

5.1. The Web API 

A Web API defines a set of queries and responses that can be 
managed using the same HTTP protocol that Web browsers 
rely on. A key feature of this design is that it is stateless, and 
does not require that the query-response connection be 
managed or maintained. A type of link called a URI, which is 
similar to the more familiar URL but can point to abstract 
instances of data (as opposed to very real Web pages) is all 
that is required. 

Both the languages and etymological databases are 
accessible in this manner. Queries are handled by SEAlang’s 
default request URL, http://api.sealang.net. A typical request 
URI is: 

http: //api.sealang.net?resource=monkhmer&show-full&text=rat 
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The show link that helped generate a screen capture seen 
earlier used this very method: 

http://api.sealang.net?resource—monkhmer&show—full&include—Di 

fl980&exclude=all 

The preliminary specification for request attributes is 
shown in Table 2. 

The response to a Web API query may be provided in 
several forms, including HTML, plain text, and XML. Again, 
this matches the database’s on-screen functionality. How this 
response is handled is up to the user: HTML output may be 
redirected to a window or frame, text output may be saved to a 
file directly or via copy-and-paste, or XML output may be 
reprocessed and repackaged. 

A point that is essential for non-programmers to grasp is 
that using the Web API need be no more difficult than 
embedding a link into a Web page, or typing the URI into a 
Web browser. Indeed, we expect that two particular 
applications will become commonplace: using URIs to specify 
references, and using browsers to copy-and-paste dictionary 
texts. 

The XML response to a Web API query matches the 
tagging scheme to be discussed in section 6, and is outlined in 
Table 3. This is a preliminary specification, and was developed 
largely in the context of encoding the data found in Shorto 
(2006), a task which is not yet complete. The tag, attribute, and 
value sets may be modified in response to discussion within 
the linguistics community. 


Request URL 

http://apijsealang.net 

http://api.sealang.net?rtsourct-monkhmey &attribute=value pairs 


Table 1. Request URL and example. 
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Request Attributes 


Attribute 

[^default] 

Values 

Effect 


numkhmer 

use the Mon-Khmer databases 

show \=self] 

full 

derive and return the complete etymological set, 
ordered from etymon to reflex 


self 

return only the matched item 


parents 

return the matched item and direct ancestors, ordered 
from etymon to reflex 



return the match item and direct ancestors, ordered 
from reflex to etymon 

phone 

any 

the phonemic value to search for 

orth 

any 

the orthographic value to search for 

text 

any 

a string value found in the definition 

id 

AuthorID:[R 

CLN/:# 

match a specific item’s ID 

include |=all| 

all 

AuthorlD 

AuthorID\Aui 
hoiID... 

one or more |-separated AuthorlD values 

exclude |=moneJ 

none 

ancestors 

AuthorlD ... 


long 

name 

name\name 

restrict the search to one or more languages 

dialect 

name 

restrict the search to the dialect, toponym, or source 
name conventionally used to denote a language subset 


Table 2. Request attributes. These are the parameters supplied with 
any use of the Web API. They reflect the functionality supplied by 
the interactive Web pages. 
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Response Tags 


tag 

subtag 

attribute= 

value 

<item 

...> 

content 

</item> 






id 

itemID, type 

:C: or :R: 



lang 

language name 



dialect 

dialect name 


<ipa...> 





m 

string 


<orth 

...> 





script= 

name 


<gloss.. .> 





pos= 

string 


<refer> 



<link ... 
/> 

none 





id 

itemID of type 
:L: 


f[rom] 


itemlL' 


explanation 
a citation (type C) or 
reconstruction (type R). 

the item’s unique identifier 

language or proto-language 
name 

commonly used dialect, 
toponym, or source name 
IPA representation 

original, non-IPA 
representation 

orthography as supplied 

an identifier for orthography 
if necessary 
definition or gloss 
part of speech 
reference 

a link between two <item> 
entries 

the item’s unique 
identifier 

the historically more recent 
item 

the historically more removed 
item 
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reflex 

instance 

phonological) 
deriv(ationall) 
inflectional) 

compound 

loan 

subtype assimilation 
dissimilation 
metathesis 
lenition 
fortition 
sandhi 
leveling 
epenthesis 
elision 

affix 

back(- 

fonnalion) 

secotulary(- 

deriv) 

certainty uncertain 

unlikely = 0.25 
possible = 0.5 
likely = 0.75 
probable = 0.9 


reflex: direct dependent of. 

instance: realization of a 
(possibly unstated) 
reconstruction. 

phon, deriv, inflect: 
indirect relations 


compound: unanalyzed 
compound form. 
loan: borrowing to/from, 
a more detailed note, 
typically associated with 
a phonologically or 
derivationally related item. 
Values of the type 
and subtype attributes 
are limited, but not 
closed. 

These are practical sets, 
based on actual reference 
literature, and are not 
intended to define formal 
ontologies. 

certainty of the analysis. 
“uncertain'’ does not 
have a specific numerical 
value. 
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Table 3. XML elements used to tag responses. These reflect the tags 
used in the internal database, and will eventually be required for new 
submissions. Note that the <link/> tag has only attributes, and no 
contents. As a general rule, controlled-vocabulary data is stored in 
attributes, with free text between tags. 


6. The Mon-Khmer Database 

It is convenient to think about the language and etymological 
databases as distinct entities. In some ways this is true. Each 
has its own Web interface, and each has distinctive 
applications. They are also conceptually distinct: one consists 
of attested or proposed citations with glosses and phonemic 
and/or orthographic realizations, while the other is primarily 
composed of relations between citations, and commentary 
about the nature of the relationships. 

In reality, though, there is only one underlying set of data. 
It contains: 


items, which are citations or reconstructions, and may include 
orthography, phonemic rendering, and glosses; 

links, which encode the relationship between items, and 


vi/'Vvtr»V» ma\/ r> 
v»mvn ILS.KXJ vutu 


nmmnnt rm itomc linVc nr rvthpr n ntp^ 


The internal format of items, links, and notes reflects the 
response-tag specification shown in Table 3. While other 
details such as timestamps may be employed locally, one can 
assume that any format returned by the database will be 
acceptable as input to the database. 


6.1. Lexicographic Data 

Within the database, all data k held in plain text files, which 
can be opened and read using ordinary text editors, and do not 
require any specialized database software. As a rule, texts are 
initially received (or are typed by the project) using traditional 
positional formatting: 
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orthography /phonetic/ part-of-speech ‘gloss’ commentary 

We then tag the data using a simple XML tagset that marks the 
type and boundary of each part of the entry in a transparent 
manner. Conceptually, the first step of the tagging process is: 

<entry> 

<orthography>orthography</orthography> 

<phonetic>phonetic</phonetic> 

<pos>part of speech</pos> 

<gloss>gloss</gloss> 

<note>note</note> 

</entry> 

As a practical matter the tagset is more richly annotated, as 
we saw in table 3. This lets us preserve additional information 
relating to authorship, original identification or numbering, 
language dialect and/or source, citations, references, and the 
like. In final form, a typical lexicographic entry looks like this: 

citem id="Banl979:C:48-l" lang="Bahnar" 

dialect="Pleiku"> 

<ipa>potar)</ipa><orth>po'tang</orth><gloss>a 
boil</gloss> </item> 

Additional notes, if any, may point to items. The note’s id 
field gives its source, usually within a larger set of notes, and 
the itemID of the entry it r(eferences). 

cnote id=”Coo2007:N:l” r=”Banl979:C:48-l”> 

This is a note about Banker, Bahnar, or boils. 

</note> 

This separation of data ana commentary would also apply 
to any note that might have appeared in the original text. Our 
intention is to separate specific language data - which might be 
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reused somewhere else - from commentary about the data. But 
the commentary is not lost; it can always be recovered by 
tracking the reference by its itemID. 

6.2. Etymological Data 

Entries in the etymological database are handled in a similar 
manner. Again, the overarching intent is to separate reusable 
data from commentary. 

reconstructions follow the <item> format used for citations, 
but the itemID has type R: rather than C:. This distinction is 
useful in searching the database. 

links include f(rom): and t(o): attributes, and connect citations 
to reconstructions and derivative items to citations, type (and 
possibly subtype ) attributes define the nature of the 
relationship. 

notes may add additional commentary. Typically they refer to 
the certainty of reconstructions, the nature of links, and the 
sources of citations. 

For example: 

<item id="Difl984:Nl:R” lang="proto 
ivionic 1 ' ><ipa>*kiin?</ipax/item> 

<item id="Difl984:Nl:R,B" lang=”proto Nyah 
Kur"><ipa>*khnji?</ipa></item> 

dink f="Dif1984:N 1 :R.B" t="Difl984:Nl:R" 
id="Dif1984:N 1 :L-2" type=”reflex/> 

dtem id="Dif1984:N 1 :R. A" lang="proto 
Mon"xipa>*[k]hnp9?i?</ipax/item> 

dink f="Difl984:Nl:R.A" t="Difl984:Nl:R" 
id="Difl984:N 1 :L-3" type=”reflex/> 

dtem id="Dif1984:N 1 :C-1" lang="Nyah Kur” 
dialect="Central"xipa>kh9nji?</ipaxgloss>rat</ 
glossx/item> 
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dink f="Difl984:Nl:C-l" t="Difl984:Nl:R.B" 
id="Difl984:N 1 :L-4" type=”reflex/> 

dtem id="Dif 1984:N 1 :C-2" lang="Mon" 

dialect="Rao"><ipa>noe?</ipa><gloss>rat, 

ouse</glossx/item» 

dink f="Difl984:Nl:C-r t="Difl984:Nl:R.A M 
id="Difl984:Nl:L-5" type=”reflex/> 

Note that links point backward, from child to parent (or 
from derivative to source), as opposed to the headword -> list 
of reflexes .... relations traditionally seen in print dictionaries. 
Above, Mon points to proto Mon, which points to proto 
Monic; similarly, Nyah Kur points to proto Nyah Kur which 
also points to proto Monic. From the computational point of 
view, this subtle alteration greatly eases the task of generating 
an internal tree of the relations between items. By recursively 
walking this tree, we can readily identify sisters (they point to 
the same parent), or show historical relations in either order. 

6.3. Extensions to Content and Tagging 

Original content will sometimes be extended in specific ways 
in order to enhance its utility for the etymological database. In 
particular: 

phonemic rendering: when orthography alone is supplied by 
the original source, we provide a preliminary phonemic 
rendering. 

‘ dialect’ tagging: there is not always a clear distinction 
between the identification provided by dialect, author, and 
place names. In practice, all serve to denote language subsets 
that are treated as being distinct for the purpose of analysis. In 
preparing datasets, we use the most appropriate information 
available to preserve this distinction under the rubric ‘dialect.' 

grouping: comparative dictionaries (and even more so field 
notes) often group items without either proposing earlier 
reconstructions or elevating a particular citation to a unique 
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primary status. In such cases it is convenient to generate a 
‘dummy’ root for the sake of grouping, as discussed below. 

inferred glossing ; when reconstructions and citations are 
presented in sets individual items are not always glossed. In 
preparing datasets, we infer glosses so that extracted subsets 
will be comprehensible. Inferred glossed are always given 
between braces: { ... }. Glosses inferred from more than one 
item are separated: { ... II ... }. 

For example, the items below are from Diffloth 1980 (which 
glosses reconstructions) and 1984 (which glosses citations). 
Inferred glosses are shown between curly braces: 

*kon child proto Waic Difl980:R:N4 

*koon {child, offspring,...//(one's own) child;...} 

proto Monic Difl984:R:N168 
kawn {child} Wa [Drage] Difl980:C:N4-l 

kon child, offspring, ... Mon [Rao] Difl984:C:N168-2 

In all cases, any additions to or modifications of the 
preliminary texts are thoroughly documented. 

6.3.1. The Dummy “root” Link 

Diffloth’s Monic and Waic datasets consist of values that have 
been explicitly linked together. But it is frequently the case 
that several items are known or believed to be etymologically 
related, but do not have an explicitly stated etymon. In such 
cases, a dummy entry whose itemID value is ‘ foot ” can be 
used to group the citations. 

For example, Huffman’s unpublished vocabulary list 
(1971) is a broad collection of citations from some 20 Mon- 
Khmer languages. A fairly cursory survey is sufficient to mark 
the occasional Tai or Indie reflex, and to group the remaindei 
into likely etymological sets. The same process can be applied 
to Banker (1979). 
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citem id="Banl979:C:58-l" lang="Bahnar" 
dialect="Kontum"xipa>hooj</ipa><orth>h6i</orth> 
<gloss>’'a few</glossx/item> 

<item id="Banl979:C:58-2" lang="Bahnar" 
dialect="Pleiku"xipa>huj</ipaxorth>hui</orth> <gloss>a 
few</glossx/item> 

<item id="Banl979:C:58-3” lang="Bahnar" 
dialect="Galar"xipa>huj</ipaxorth>hui</orth> <gloss>a 
few</glossx/item> 

<link f=”Banl979:C:58-l” to=”root” id=”Banl979:L:58-17> 
<link f=”Banl979:C:58-2” to=”root” id=”Banl979:L:58-2’V> 
<link f=”Banl979:C:58-3” to=”root” id=”Banl979:L:58-2”/> 

Internally, the root reference is automatically replaced by a 
dummy reconstruction (e.g. named Banl979:R:58 ). The root 
and target are only used in collecting sisters from the same 
language (albeit different sources or dialects). At a future date, 
when this data is more thoroughly analyzed by the project, the 
‘to’ reference is readily replaced by the ID of a formal 
reconstruction. 

7. Collaboration in the Mon-Khmer Languages Project 

An important goal of the project is to provide a collaborative 
workspace for researchers in the field. Just as anonymous peer 
review is an essential component of quality in publication, the 
open distribution and discussion of preliminary data and 
results can be an important stage in the evolution of work 
intended/or publication. 

The functionality provided by the language and etymology 
databases, including the ability to aggregate data across 
dialects, languages, or branches, to perform sophisticated 
phonemic searches, and to propose and view the implications 
of new analyses, are not intended to be reserved for previously 
published data or theories. Rather, the databases provide an 
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important forum for seeing ideas in action, in an on-line 
context that can easily be accessed and teststed by colleagues 
around the world. 

The idea that data of different vintages, so to speak, may 
be intermingled in a database without becoming irrevocably 
intermixed is not commonly encountered. Nevertheless, this 
functionality is easily provided by the authorlD and datalD 
mechanisms, and is discussed further below. 

7.1. Adding Content to the Database(s) 

The Mon-Khmer Languages Project welcomes additions of 
data and analyses to the databases. Elaborate formatting of 
such additions is not required. Our experience thus far has 
been that most datasets can be tagged purely on the basis of 
their existing internal layout, legardless of whether they are 
columnar, labeled and indented, or tagged in some other 
manner. 

During the first phase of the project (2007-2009) we will 
assume responsibility for extracting data and tagging it in 
MKLP format. In the future, we will continue to do so, but will 
also specify simple submission formats (e.g. tabbed or labeled 
values) that can be tagged automatically. Submission of data in 
electronic form is highly preferable. 

Submissions may be copyrighted, of course, but 
contributors should have the expectation that data may be 
reused if full attribution is given, in accordance with traditional 
academic procedure. 

7.2. Types of Additions 
Additions take three forms: 

data consists of citations and reconstructions. Each should 
include at a minimum a language or proto-language name, a 
dialect identifier if appropriate, phonemic and/or orthographic 
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rendering, and a gloss. Additional information, such as part-of- 
speech, may also be supplied. 

relations have three parts, as seen in the response tag 
documentation in table 3: f[rom] and t[o] fields (from is 
always more recent, to is always older), and a type (and 
possibly a subtype ) field. 

notes must include a r[eference ] attribute that identifies an 
item, relation, or note, 

Note that contributions may consist solely of relations 
(and/or notes). The link below makes the claim that the 
Difl984:Nl group is a reflex of Sho2006:R:93.A. Similar links 
might provide commentary. In principle, hundreds or 
thousands of such links might be submitted 

dink f=”Difl984:R:Nl” t=”Sho2006:R:93.A” id- 
“Coo2007:L: 1” type=”reflex” /> 

The Mon-Khmer Languages Project is committed to 
accepting all such contributions, subject to the limitations of 
good taste and common sense. The ID mechanism is used to 
include or exclude contributions, just as it can include or 
exclude or exclude ordinary data sets. A subtler degree of 
control is provided, seen earlier in the Reanalysis area of the 
restrict tab. 


Allow non-conflicting add-ons 

Additional links and notes are allowed, 
but only when they do not conflict 
with the original author 

Allow all additions 

Additions may override original authors 

Allow original authors only 

Only original authors may override 

Allow author-approved 

Original data suppliers may certify 

additions 

Allow additions by ... 

Specify IDs for inclusion 

Disallow additions by ... 

Specify IDs for exclusion 
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8. DataBase Contents 

The sources originally proposed for entry during the first two 
years of the Mon-Khmer Languages Project are shown below. 
Asterisked items have already been at least partially entered, as 
have Huffman (1971), Sidwell (2005), and Shorto (2006). 

ASLIAN BRANCH (Geoffrey Benjamin, advisor) 

Temiar Means, Natalie. 1999. Temiar-English, 

English-Temiar Dictionary. 

Setnai/Senoi Means, Nathalie & Paul B. Means. 1987. 

Senoi-English English-Senoi Dictionary. 

BAHNARIC BRANCH (Paul Sidwell, advisor) 

Sedang Smith, Kenneth. 2000. Sedang Dictionary. 

* Bahnar Banker, Banker & Mo'. 1979. Bahnar 

Dictionary, 

Plei Bong-Mang Yang Dialect. 

Chrau Thomas, David & Dorothy Thomas 1961. 

Chrau-Vietnamese-English. 

KATUIC BRANCH (Paul Sidwell, advisor) 

Ngeq Smith, Ron. 1976. Ngeq dictionary. 

Pacoh Watson, Watson, Cubuat. 1979. Pcicoh 

Dictionary: Pacoh-Vietnamese-English. 

KHASIC BRANCH (Anne Daladier, advisor) 

* Khasi Nissor Singh, U. 1906. English-Khcisi Dictionary. 

KHMUIC BRANCH (Suwilai Premsirat, advisor) 

Khmu Suwilai Premsirat. 2002. Thesaurus of Khmu 

Dialects in Southeast Asia 
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MONIC BRANCH (Christian Bauer, advisor) 

* Proto-Monic Diffloth, G. 1984. The Dvaravati Old-Mon 

language and Nyah Kur. 

Nyah Kur Luang-Thongkum, Theraphan. 1984, Nyah Kur 

- Thai - English Dictionary 

NICOBARIC BRANCH 

* Car Whitehead, George. 1925. Dictionary of the 

Car-Nicobarese language. 

PAKANIC/MANGIC BRANCH (Jerold Edmondson, advisor) 

Bolyu Edmondson, Jerold, 1995. Lexica: English- 

Bolyu Glossary. Mon-Khmer Studies 

Lai Liang, M, 1984. A brief description of the Lai 

language, Minzu Yuwen 

PALAUNGIC BRANCH (Justin Watkins, advisor) 

* Proto-Waic Diffloth, Gerard. 1980. The Wa Languages. 

Palaung Milne, Leslie. 1931. A dictionary of English- 

Palaung and Palaung-English. 

Wa SOAS Wa Dictionary Project database. 

PEARIC (Suwilai Premsirat, advisor) 

Chong, Chung, Kasong, Suwilai Premsirat, ms. A Comparative 
SonSamre, Su’ung, Pear lexicon of 8 Endangered Pearic 

Languages. 

VIETIC BRANCH (Michel Ferlus, Mark Alves, advisors) 

*pViet-Muong Ferlus, Michel. ms. Proto-Viet-Muong 

reconstruction and comparative lexicon. 

Muong Barker, M. E. & M. A. Barker, 1976. Muong- 

Vietnamese-English Dictionary. 
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Rue Nguyen Phu Phong, Tran Tn Doi, Ferlus. 1998. 

Lexique vielnamien-nic-francais. 

Solncev, V, M., N. V, Solnceva & 1, V. 

Samarina, 2001. Jfazyk ruk. 
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one of the ethnic groups which migrated from Laos 
to Thailand more than 50 years ago during the 
French occupation. They crossed Mekhong River to 
Tha Bo District, Nongkhai Province and later on 
dispersed to other provinces. The majority of them 
are in Nakornphanom, Sakonnakhon, Mukdahan, 
Ubonratchathani, Udornthani provinces. 


The Vietnamese are one of the ethnic groups who 
have been absorbed in the Thai society. Their 
cultural identity especially language is now losing 
ground gradually. But there are some groups in 
some provinces who have put their efforts to retain 
their ethnic identity and they teach Vietnamese 
language to the young generation freely in order to 
maintain their Vietnamese cultural identity in their 
community. The purpose of this study is to 
investigate how the attitudes of the three generations 
of such Vietnamese (15-30, 35-50, 55+ years old) 
for their preservation of language in the provinces 
mentioned above. The findings are as follows: 


1. All three generations are proud to be 
Vietnamese. 

2. For the young generation, they are not sure 
whether the Vietnamese language is important 
for them. 
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3. The people under 40 years have lower skills in 
Vietnamese language. 

4. The schools in the Vietnamese community 
should have a Vietnamese language class. 

5. The local government offices should promote 
the learning of Vietnamese along with other 
elements of the Vietnamese culture. 

6. All three generations agree that using 
Vietnamese language among themselves make 
them feel united in the same group. 

Nowadays Vietnamese courses and curriculum are 
taught in many institutions in Thailand. The contact 
and exchange between Thailand and Vietnam are 
wide open; they can visit or learn by going to 
Vietnam. Therefore, there is a hope that this 
language will be maintained in the Thai society. 

Introduction 

Vietnamese belongs to the Vietic Branch of the Austroasiatic 
Language Family. There are approximately 80,000 Vietnamese 
people in Thailand. The Vietnamese migrated to Thailand 
especially to the Northeastern provinces of Thailand because 
French troops tried to seize the areas of Tha Khaek, Kham 
Muan in Laos but were attacked by Vietnamese and Laos 
armies up to the 21 st March 1946. Tha Khaek and other cities 
in Laos near the Khong River were seized and ruthlessly 
attacked by the French. The Vietnamese people, as well as 
Vietnamese and Laos troops migrated to Thailand by crossing 
Khong River into various provinces of Thailand near the 
Khong River: Chiang Saen District (Chiang Rai Province) in 
the North, Nongkhai, Nakornphanom, Mukdahan and 
Ubonratchathani provinces in the Northeast. The number of 
migrant Vietnamese were about 40,000-50,000. (Thanyatip 
2005: 24, 30). Some Vietnamese overseas described the 
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Mekhong River at Tha Khaek as “Sorrowful River” because of 
the vicious fighting of the French troops against the 
Vietnamese and Laos troops. 

The Thai policy towards the Vietnamese migrants in 
Thailand in the past varied from time to time depending on 
how afraid the government was to the spread of Communism 
in the region, the strategic opposition to Communism of the 
United States and the seriousness of the wars against the 
French and US Armies in Vietnam. The Thai government set 
up rules to control Vietnamese migrants which affected their 
daily lives such as fixing areas of their habitation and 
occupation. They had to get permission from the local 
authorities to leave these areas. They were afraid of arrest 
because government authorities watched them fearing they 
might cause problems and because the Thai Army and 
Government was the good ally of the US. They tried to prevent 
the Vietnamese migrants from helping the Northern 
Vietnamese troops in Vietnam fighting against the US. The 
suffering of the Vietnamese migrants lasted 40 years. The Thai 
government led by Marshal Chatchai Chunhawan had a policy 
to change the Indochinese battle field into the commercial field. 
So many rules were changed. (Ibid: 70-71, 177) 

As the Vietnamese had to endure oppressive life, they had 
no freedom to display their culture openly. They did not have a 
chance to study in Thai schools, but they gradually assimilated 
into Thai society. Many Vietnamese migrants have obtained 
Thai citizenship and the young generation study in the Thai 
education system. Nowadays, much of their cultural identity 
including language is gradually being lost. People under forty 
cannot use Vietnamese fluently. There are some Vietnamese 
people in northeastern provinces who were concerned about 
their cultural identity and are trying to maintain their language 
and some customs. 
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This language situation inspired me to conduct this research 
which investigates Vietnamese attitudes toward language, 
language use and language ability of the three groups of aged: 
under <25, 26-50, over >51 years old in the provinces of 
Nongkhai, Sakonnakhon, Ubonratchathani and Udornthani. 
Mukdahan and Nakornphanom provinces are not included in 
this analysis because the data collection is not complete yet. 

I. Vietnamese Language Status in Thailand 

In Vietnam, Vietnamese is the national language, but in 
Thailand, the Vietnamese are one of the ethnic groups which is 
under Thai law. Their language in Thai society can be 
classified as a marginal regional language according to 
Smalley (1994:115-119). Although he did not mention clearly 
what the status of the Vietnamese language in Thailand was, 
according to his description of marginal regional languages by 
the following criteria, Vietnamese language can be categorized 
in this way: 

1) They tend to be lumped with the “minority languages” which 
statistically they are, but they differ in important ways from 
other categories of languages. 

2) They are sometimes thought of as “trade languages”. 

3) They are not in the mainstream of Thai life. 

4) They are another step down in the hierarchy of communication 
networks in the country. 

Marginal regional languages are classified as “marginal” 
because they extend into Thailand as fringes of larger groups 
on the other side of the border where they are located. 

The Vietnamese language is used among Vietnamese 
communities in the northeast of Thailand. The Vietnamese 
communities, like the Thai local people, speak the 
Northeastern Thai dialect (Isan) and Central Thai (Standard 
Thai) which is used as the official language, in the media and 
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educational system. Therefore, the Vietnamese people can use 
Vietnamese, and the Northeastern Thai and the Central Thai 
dialects. They are bilingual, whereas people under 40 can use 
only Thai and Northeastern Thai dialects well. They can 
understand some Vietnamese but cannot speak fluently, except 
for some families who teach their children Vietnamese or let 
their children attend Vietnamese class. 


In terms of Thai education, the courses in Vietnamese as a 
national language have been taught in higher education both in 
the military and civil institutions for more than 30 years. In the 
past, the soldiers studied Vietnamese for military purposes, but 
Thai students and others study Vietnamese as a third or fourth 
foreign language. The Vietnamese language became popular 
after the Thai government changed its policy from the 
battlefield to the commercial field. Diplomatic relations 
between Thailand and Vietnam have developed well. Trade 


and investment by Thai businesses in Vietnam is booming. 
Exchanges between people are open and easy. The need for 
Vietnamese language experts is increasing especially in the 
Thai business sector. Therefore, some universities in Thailand 
have responded to the positive policy by providing a 
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Although this trend has just started, it appears to have been 
successful and well received. 


The Vietnamese dialects spoken in Thailand, especially in 
the Northeast are mainly the North and Central dialects 
because the Vietnamese migrated from Nghe An and Ha Tinh, 
two provinces in the northern part of Vietnam and areas in tne 
Central part of Viet Nam such as Quang Binh province. 


II. Sociolinguistic Factors Relating to the Decrease of the 
Vietnamese 

The Vietnamese people in the northeast of Thailand like local 
Thais, speak Northeastern Thai dialect known as Isan dialect, 
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and the Central Thai dialect which is identified as Standard 
Thai. This term is used in this paper too. Thai-Vietnamese 
people is the term used to refer to this group of people in this 
paper. They are bilingual in Vietnamese and Thai languages. 

MI. Contributions of the Vietnamese Communities for 

Language and Culture Maintenance 

Nowadays, many Vietnamese communities are aware of the 
need to maintain their language and culture by offering their 
place or time to teaching Vietnamese to the public as in the 
following examples. 

In Nakornphanom, one Vietnamese businessman has 
provided a building and volunteer teachers to teach 
neighboring languages: Thai, Vietnamese and Lao to the 
public for free. There are classes for kids up to secondary 
school. The students learn in small groups taught by a 
volunteer teacher. Moreover, there is a group of Vietnamese 
exchange students from Vietnam studying at Ratchabhat 
Nakornphanom University learning Thai from a volunteer Thai 
teacher who can speak Vietnamese and Lao. The atmosphere 
in this school is very friendly. This school has been recognized 
by the province as well as the Vietnamese government. The 
owner was awarded a certificate from the government of 
Vietnam for his devotion to the maintenance of Vietnamese 
culture. It is the learning center of the province. Moreover, the 
owner of the tallest building has allowed a Community Radio 
to broadcast from his building for free. 

In Nakornphanom, the government of Vietnam has 
contributed money to the construction of a new school for 
teaching Vietnamese, but due to Thai regulations, it cannot 
start teaching independently and so has not opened yet. 

There is a museum and Ho Chi Minh village at Na Jok 
Village, Muang District, Nakornphanom Province which is 
about 5 kilometers from the eerier. It was the village where Ho 
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Chi Minh spent time teaching Vietnamese children and others 
about the world situation, the situation in Vietnam, politics, 
Marxism and Leninism, patriotism, human relationships, etc. It 
was also the place for weapons training against the French 
(Ibid., p. 65). This is a village where the Vietnamese people 
mix with the local Thai people, but they can maintain elements 
of their culture such as language, ceremonies and Vietnamese 
temple and cemetery. The museum is taken care of by the 
province and the Vietnamese people there. It is the place of 
friendship between the government of Thailand and Vietnam. 
The Foreign Ministry of Thailand has provided a generous 
budget to build a digital library in the museum. This museum 
is one of the tourist spots of Nakornphanom. 

In Mukdahan Province, there is a Community College 
which provides Vietnamese courses taught by the Vietnamese 
from the province. Here, we can feel the Vietnamese 
atmosphere. From early morning, there is the Vietnamese 
market selling Vietnamese foods which are popular. 
Customers are Vietnamese and Thai. There is a big market 
called “Indochinese Market” near Khong River which sells 
goods from China, Vietnam, Laos and Thailand. It is a tourist 
spot of Mukdahan Province. There is a bridge linking 
Mukdahan to Savannakhet in Laos. From this spot, there is the 
short route through Laos to Vietnam’s Quang Tri Province in 
central Vietnam. During holidays, Mukdahan Province is full 
of tourists from in and outside the country. 

In Nongkhai Province there is a Vietnamese family selling 
Vietnamese food in the evening and offering free Vietnamese 
classes to children who are interested in Vietnamese language 
at this house. The owner is about 50 years old and graduated 
from Thai primary school can developed the Vietnamese 
curriculum by himself, teaching his free classes at home. 
Moreover he is a volunteer teacher of Vietnamese as an 
elective subject in a school in the province. The school is 
interested in this language and has encouraged the program by 
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hiring a young Vietnamese native teacher from Vietnam to 
teach Vietnamese in this school. Moreover, the Vietnamese 
community has tried to maintain their culture with women 
dressing in the Vietnamese national dress called “ao dai” (long 
cloth) during Vietnamese ceremonies and provincial activities. 
Vietnamese restaurants can be found easily in this province. 

In Udornthani Province, there are groups of Vietnamese 
people who have tried to maintain their language as well by 
providing extra classes. Any children who are interested in 
learning Vietnamese can study individually or in groups. The 
teacher can teach other subjects too. It is like a private tutor 
teaching at the teacher’s home. Moreover, Ratchabhat 
Udornthani University provides Vietnamese courses for the 
public on occasion and Vietnamese are sometimes invited to 
teach. The permanent Thai teacher of this university has been 
sent to Ha Noi, Vietnam for his Ph.D in Vietnamese 
linguistics. 

Other cultural aspects include several famous Vietnamese 
food shops and restaurants in this province. There are 
Vietnamese temples where traditional beliefs are practised. 
They serve as centers for local Vietnamese people. 

In Ubonratchathani Province, there are two universities 
providing Vietnamese curriculum: Ratchabhat 

Ubonratchathani University and Ubonratchathani University. 
At Ratchabhat University, Vietnamese course was not popular 
and after some time, it was shut down. At Ubonratchathani 
University, there is a curriculum from Bachelor to Master 
Degree, but currently, only the Bachelor degree is provided. 
The Thai teacher has been sent to study Ph.D. in Ha Noi too. 

IV. Vietnamese Language Attitude in Some Northeastern 

Provinces of Thailand 

Various methods including questionnaire, informal 
observation and interview were used in investigating the 
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Vietnamese language situation in provinces in the Northeast of 
Thailand. 

The questionnaire is the main tool for data collection. 
There are 5 main parts, as follows: 

(i) Part 1: Personal information composed of the mainly 
open questions such as No. 3 Where is your father’s 

hometown?.including several closed questions 

such as No. 1 Sex: 1. Male 2. Female. 

(ii) Part 2: Language use in daily life. 

(iii) Part 3: Attitude towa:ds their own ethnic group. 

(iv) Part 4: Attitude towards their language and identity. 

(v) Part 5: Language ability. 

200 questionnaires are used for analysis from 4 provinces 
but data collection from Nakoenphanom and Mukdahan 
Provinces has not been complete yet. Each part of the study 
will be shown below. 

Part 1: Personal Information 

This part investigates the personal background of each 
resnondent. It contains 15 auestions: address, sex. aae. father’s 

* L ' ' W » 

hometown and language, mother’s hometown and language, 
place of birth, marital status, couple’s hometown and language, 
education. The findings are as follows: 

Sex: There are 94 male respondents (47%) and 106 female 
respondents (53%). 

Age: They are divided into 3 groups: 

- 48 respondents (24%) are <25 years old group. 

- 76 respondents (38%) are 26-50 years old group. 

- 76 respondents (38%) are >51 years old group. 

Ethnic group of father : 98 respondents (49%) are 
Vietnamese and 102 respondents (51%) are Thai-Vietnamese. 
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Ethnic group of mother: 96 respondents (48%) are 
Vietnamese and 104 respondents (52%) are Thai-Vietnamese. 

Education: A high percentage of Thai-Vietnamese people 
are educated, especially the young generation. The older 
generation confronted difficulties when they migrated to 
Thailand, so, they did not have a chance to attend Thai school. 
But for the younger generation, their parents support them well. 
The details of education levels of these people are shown as 
follows: 

-47 respondents (23.5%) did not study. 

-31 respondents (15.5%) studied in Primary School. 

-57 respondents (28.5%) studied in Secondary School. 

-51 respondents (25.5%) studied in Tertiary School 
(Occupational College and University) 

-14 respondents (7%) studied by themselves. 

Occupation: The majority (65%) are traders. They are 
merchants of various kinds of goods such as selling food in the 
market, owners of shops or companies located in the center of 
cities, as in the following details: 

-130 respondents (65%) are traders. 

- 4 respondents (2%) are government officials. 

-15 respondents (7.5%) are employees. 

-13 respondents (6.5%) are skilled-labourers. 

-25 respondents (12.5%) are students. 

-6 respondents (3%) are housewives. 

-7 respondents (3.5%) did not identify. 

Mother Tongue: The mother tongue of the Thai- 
Vietnamese in the Northeast of Thailand can be shown as 
follows: 


-81 respondents (40.5%) speak Thai. 



Vietnamese Attitudes 


147 


-97 respondents (47.5%) speak Vietnamese. 

-1 respondent (0.5%) speaks Lao. 

-7 respondents (3.5%) speak Thai-Vietnamese. 

-9 respondents (4.5%) speak Thai-Vietnamese. 

-2 respondents (1.0%) speak Thai-Lao. 

-3 respondents (1.5%) did not identify. 

Migration: Fewer Thai-Vietnamese in the northeast of 
Thailand moved to other provinces in Thailand than those who 
never moved elsewhere. In the past, due to restrictions and 
controls in the reserved areas, it was not easy for them to travel 
freely. Leaving the reserved areas required permission from 
the local authorities which discouraged the Thai-Vietnamese 
people from moving unless necessary. The details are shown 
as follows: 

-50 respondents (25%) have moved to some other places both 
inside the country and abroad. 

-150 respondents (75%) have never moved to elsewhere. 

Marital status: 

-106 respondents (53%) are married. 

-76 respondents (38%) are single. 

-15 respondents (7.5%) are divorced or widow. 

Ethnic group of spouse: The data calculated from the 106 
couples are as follows: 

-57 respondents (53.77%) are married with other Thai- 
Vietnamese. 

-28 respondents (26.41%) are married with a Thai partner. 

-20 respondents (18.86%) are married with a Vietnamese partner. 

-1 respondent (0.94%) is married with a Laotian. 
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Table 1: The number and percentage of participants classified by 
background 


Variables 

Number (200) 

Percentage 100%) 

Sex 



Male 

94 

47 

Female 

106 

53 

Age 



<25 

48 

24 

26-50 

16 

38 

>51 

76 

38 

Ethnic group of father 



Vietnamese 

98 

49 

Thai-Vietnamese 

102 

51 

Ethnic group of mother 



Vietnamese 

96 

48 

Thai-Vietnamese 

104 

52 

Education 



No education 

47 

23.5 

Primary 

31 

15.5 

Secondary 

57 

28.5 

Tertiary 

51 

25.5 

Self-study 

14 

7.0 

Occupation 



Trader 

130 

65.0 

Govt, official 

4 

2.0 

Employee 

15 

7.5 

Skilled-labourers 

13 

6.5 

Student 

25 

12.5 

Housewife 

6 

3.0 

Not identify 

7 

3.5 
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Mother Tongue 

Thai 

Vietnamese 

Laos 

Thai-Vietnamese 

Vietnamese-Thai 

Thai-Laos 

Not identify 

81 

97 

1 

7 

9 

2 

3 

40.5 

48.5 

0.5 

3.5 

4.5 

1.0 

1.5 

Emigration 



Ever 

50 

25 

Never 

150 

75 

Marital status 



Single 

76 

38 

Married 

106 

53 

Divorced or widowed 

15 

7.5 

Ethnic group of spouse (from 



106 married couples) 



Thai-Vietnamese 

57 

53.77 

Thai 

28 

26.41 

Vietnamese 

20 

18.86 

Laos 

1 

0.94 


According to the above data, age, occupation, education will 
be variables for language use, language attitudes investigation 
as follows: 


Part 2: Language use in daily life 

Language use in daily life can be classified into 3 main 
situations: 

2.1 Inside the family 

2.2 Outside the family 

2.3 Personal skills 
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2.1. Inside the family: 

The contents asking for language use inside the family are 
as follows: 


- language use with the grandparents 

- language use with the parents 

- language use with the cousin 

- language use with the children 

- language use with other relatives 

- language used the most in the family 

The result from the study that inside the family, 45.74% of 
all age groups use the Vietnamese language most, 36.20% of 
the speakers use Standard Thai, 17.91% of the speakers use 
Isan and 0.15% of the speakers use another language which 
was not identified. 

People in the over 51 age group use Vietnamese with all 
relatives inside the family the most. They also use more 
Standard Thai and Isan with same-aged and junior relatives. 

People in the 26-50 age group use Vietnamese language 
with seniors more than same-aged and junior relatives. They 
use Standard Thai the most with children. 


oil 

Yv i tn tin l V/ 


The younger generation (<25) use Standard Thai the most 

hqp VifUnflmeQp lancrnacrp the. least in 

1 Cl Cl V V-/0 . * ivixiuuivi^w A ~ V * -- 

the family. The percentages of language use inside the family 
is shown in the following table: 

Table 2 Percentages of language use inside the family 
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Selected 
languages —> 

Language 
situation j 

Age 

VN 

Isan 

ST 

Others 

Total 

10. What 

<.25 

18 

1 

22 


47 

language do vou 


(12.24%) 

(36.84%) 

(64.70%) 


(23.5%) 

use with your 

26-50 

57 

12 

12 


81 

grandparents? 


(38.77%) 

(63.16%) 

(35.29%) 


(40.5%) 


£51 

72 

- 

- 


72 



(48.98%) 




(36%) 

Total Means of 


147 

19 

34 


200 

percentage 


(735%) 

(95%) 

(17%) 



11. What 

£25 

11 

7 

36 


54 

language do you 


(7.80%) 

(20.59%) 

(63.16%) 


(23.27%) 

use with your 

26-50 

56 

25 

19 


100 

parents? 


(39.72%) 

(73.53%) 

(33.33%) 


(43.10%) 


£51 

74 

2 

2 


78 



(52.48%) 

(5.88%) 

(3.51%) 


(33.62%) 

Total Means of 


141 

34 

57 


232 

percentage 


(60.34%) 

(14.65%) 

(2457%) 



12, What 

£25 

7 

5 

40 


52 

language do vou 


(7.07%) 

(9.62%) 

(41.67%) 


(21.05%) 

use with your 

26-50 

31 

31 

35 


97 

cousin? 


(31,31%) 

(59.62%) 

(36.46%) 


(39.27%) 


£51 

61 

16 

21 


98 



(61.62%) 

(30.77%) 

(21.88%) 


(39.68%) 

Total Means of 


99 

mn 


■ ... 

247 

percentage 


(40.08%) 





13. What 

£25 

1 

4 

11 


16 

language do vou 


(1.89%) 

(11.43%) 

(11.58%) 


(8.74%) 

use with your 

26-50 

15 

9 

43 


67 

children? 


(28.30%) 

(25.71%) 

(45.26%) 


(36.61%) 


£51 

37 

22 

41 


100 



(69.81%) 

(62.86%) 

(43.16%) 


(54.64%) 


Total Means of 
percentage 


(28.96%) 


35 

(19.12%) 


95 

(51.91%) 


183 
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14. What 
language do you 
use with other 
relatives such as 
uncle, aunt, 
untie, niece, 
nephew.,. ? 

<.25 

26-50 

>51 

10 

(9%) 

40 

(36.04%) 

61 

(54.95%) 

6 

(11.11%) 

29 

(53.70%) 

19 

(35.19%) 

36 

(3158%) 

42 

(36.84%) 

36 

(31.58%) 

1 

53 

(18.93%) 

111 

(39.64%) 

116 

(41.43%) 

Total Means of 


111 

54 

114 

1 

280 

percent tage 


(39.64%) 

(19.28%) 

(40.71%) 

(0.35%) 


15. What 

<125 

5 

8 

35 

1 

49 

language do you 


(6.94%) 

(16%) 

(36.08%) 

(1(X>%) 

(16.73%) 

use the most in 

26-50 

24 

23 

40 


87 

the fami ly? 


(33.33%) 

(46%) 

(41.24%) 

- 

(39.55%) 


S5! 

43 

19 

22 


84 



(59.72%) 

(38%) 

(22.68%) 

- 

(38.18%) 

Total Means of 



50 

97 

I 

220 

percentage 



(22.73%) 

(44.09%) 

(0.45%) 


Total 


623 

244 

493 

2 

1,362 

Means as 


45.74 

17.91 

36.20 

0.15 

100% 

percentages 








The above table presents in following details: 


No. 10 : language use with grandparents: 

-73.5% of the three age groups use Vietnamese to grandparents. 

-17% of the young and middle age groups use Standard Thai to 
their grandparents. 

-9.5% of the young and middle age group use Isan to their 
grandparents. 

No, 11: language use with parents: 

-60.34% of the three age groups use Vietnamese to parents. 
-24.57% of the three age groups use Standard Thai to parents. 
-14.65% of the three age groups use Isan to parents. 

No. 12: language use with cousin: 

-40.08% of the three age groups use Vietnamese to cousin. 
-38.87% of the three age groups use Standard Thai to cousin. 
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-21.05% of the three age groups use Isan to cousin. 

No. 13: language use with the children: 

-51.91% of the three age groups use Standard Thai to the 
children. 

-28.96% of the three age groups use Vietnamese to the children. 

-9.12% of the three age groups use Isan to the children. 

No. 14: language use with other relatives: 

-40.71% of the three age groups use Standard Thai to their 
relatives. 

-39.64% of the three age groups use Vietnamese to their relatives. 

-19.28% of the three age groups use Isan to their relatives. 

-0.35% of the young age group use other language (not identify) 
to the relative. 

No. 15: language use the most in the family: 

-44.09% of the three age groups use Standard Thai the most in 
the family. 

-32.72% of the three age groups use Vietnamese the most in the 
family. 

-22.73% of the three age groups use Isan in the family the most. 

2.2. Outside the family 

The Questing content asking about language use outside the 
family is as follows: 

- language use with Vietnamese in the market 

- language use when meeting a non-Vietnamese 

- language use when greeting a Vietnamese friend 

- language use with non-Vietnamese sellers 

- language use with colleagues 

- language use with childhood friends 
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The study finds out that Standard Thai is used the most 
outside the family domain by all three age groups. 

People in the under 25 age group use Standard Thai the 
most in all situations. 

Vietnamese language is used the most by the middle age 
and senior groups to the Vietnamese people. Standard Thai and 
Isan are used most by people in the 26-50 age group with Non- 
Vietnamese people. Isan is used by the people in the under 51 
age group the most with non-Vietnamese people. 


Table 3 Percentages of the language use outside the family 


Selected 
languages —> 
Language 
situation J, 

Age 

VN 

Isan 

ST 

Others 

Total 

1. When you 

£25 

16 

9 

32 

. 

57 

meet a 


(11.11%) 

(23.68%) 

(57.14%) 


(23.85%) 

Vietnamese 

26-50 

56 

18 

19 

i 

93 

speaker in the 


(38.89%) 

(47.37%) 

(33.93%) 

(100%) 

(38.91%) 

market, what 

2:51 

72 

11 

5 

- 

89 

language do 

, 

(50%) 

(28.95%) 

(8.93%) 


(37.24%) 

you use? 







Total Means of 


144 

38 


1 

239 

percentage 


(6025%) 

(15.90%) 


(0.42%) 


2. When you met 

£25 

_ 

11 

36 


47 

a non-Vietnames 



(10.18%) 

(35.64%) 


(21.17%) 

speaking, what 

26-50 

5 

41 

44 


90 

language do you 


(38,46%) 

(37.96%) 

(43.56%) 


(40.54%) 

use? 

2:51 

8 

56 

21 


85 



(61.64%) 

(51.85%) 

(20.79%) 


(38.29%) 

Total Means of 


13 

108 

101 


222 

percentage 


(5.85%) 

(48.65%) 

(45.49%) 



3. What 

£25 

6 

8 

39 


53 

language do 


(5.31%) 

(15.69%) 

(46.99%) 


(21.46%) 

greet a 

26-50 

42 

27 

35 


104 

Vietnamese 


(37.17%) 

(52.94%) 

(42.17%) 


(42.10%) 

friend? 

2:51 

65 

16 

9 


90 



(57.52%) 

(31.37%) 

(10.84%) 


(36.44%) 

Total Means of 


113 

51 

83 


247 

percentage 


(45.75%) 

(20.64%) 

(33.60%) 
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4. What 
language do 
you use to a 

non- 

Vietnamese 

seller? 

<125 

26-50 

£51 

1 

(20%) 

2 

(40%) 

2 

(40%) 

10 

(9.26%) 

41 

(37.96%) 

57 

(52.78%) 

39 

(3362%) 

49 

(42.24%) 

28 

(24.14%) 


50 

(21.83%) 

92 

(40.17%) 

87 

(38%) 

Total Means of 


5 


116 


229 

percentage 


(2.18%) 

(47.16%) 

(50.65%) 



5. What 

<125 

_ 

7 

43 

1 

5) 

language do 



(8.54%) 

(40.95%) 

(100%) 

(22.08%) 

you use to 

26-50 

4 

40 

42 


86 

colleague? 


(9.30%) 

(48.78%) 

(40%) 


(37.23%) 


£51 

39 

35 

20 

“ 

94 



(90.7%) 

(42.68%) 

(19.05%) 


(40.7%) 

Total Means of 


43 

82 

105 

1 

231 

percentage 


(18.61%) 

(3530%) 

(45.45%) 

(0.43%) 


6. What 

£25 

2 

10 

40 


53 

language do 


(2.32%) 

(16.39%) 

(47.62%) 


(22.84%) 

you use to a 

26-50 

22 

35 

32 


89 

childhood 


(25.58%) 

(57.38%) 

(38.09%) 


38.36%) 

friend? 

£51 

62 

16 

12 


90 



(72.09%) 

(26.23%) 

(14.28%) 


(38.79%) 

Total Means of 


86 

61 

84 


232 

percentage 


(37.07%) 

(26.29%) 

(36.21%) 



Total 


404 

448 

545 

3 

141X) 

Means as 


28.86 

32 

38.93 

0.21 

100% 

percentages 



J_ 



J_ 


The above table presents the following details: 


No. 1 language use with the Vietnamese in the market: 

-60.25% of the three age groups use Vietnamese. 

-23.43% of the three age groups use Standard Thai. 

-15.90% of the three age groups use Isan. 

-0.42% of the three age groups use another language (not 
identified). 

No. 2 language use when meeting with a non-Vietnamese: 

-48.65% of the three age groups use Isan. 
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-45.49% of the three age groups use Standard Thai. 

-5.85% of the middle and senior groups use Vietnamese. 

No. 3 language use when greeting a Vietnamese friend: 

-45.75% of the three age groups use Vietnamese. 

-33.60% of the three age groups use Standard Thai. 

-20.64% of the three age groups use Isan. 

No. 4 language use to non-Vietnamese seller: 

-50.65% of the three age groups use Standard Thai. 

-47.16% of the three age groups use Isan. 

-2.18% of the three age groups use Vietnamese. 

No. 5 language use with a colleague: 

-45.45% of the three age groups use Standard Thai. 

-35.50% of the three age groups use Isan. 

-18.61% of the middle and senior groups use Vietnamese. 

-0.43% of the young group use other language (not identify). 

No. 6 language use to a childhood friend: 

-37.07% of the three age groups use Vietnamese. 

-36.21% of the three age groups use Standard Thai. 

-26.29% of the three age groups use Isan. 

2.3. Personal skills 

The contents of this domain involve the language use in the 
thinking system and activities such as singing. The findings are 
shown as follows: 

Standard Thai is used the most (54.08%) for personal skill 
expression. Vietnamese is the second most frequently used 
language (33.28%) and Isan the third (12.17%). 



Vietnamese Attitudes 


157 


People with >51 age group use Vietnamese language the 
most in all circumstances. 

The other two age groups use Standard Thai the most in all 
circumstances. 


Table 4: Percentages of language use of personal skill expression 


Selected 
languages—> 
Language 
situation J. 

Age 

VN 

Isan 

ST 

Others 

Total 

of 

each 

age 

group 

Percenta 

ge 

7. What 

<.25 

1 (2.08%) 

7 

40 (83.33%) 

- 

48 

22.02% 

language do vou 



(14.58%) 


- 



use when you 

26-50 

18 


48 (56.47%) 

1 (1.18%) 

85 

38.99% 

think? 


(21.18%) 

19 






£51 


(22.35%) 

18 (21.18%) 

- 

85 

38.99% 



54 






* 


(63.53%) 

12 








(14.12%) 



■ 


Total of all age 


73 

38 

106 

1 


100% 

groups 


(33.49%) 

(17.43%) 

(48.62%) 

(0.46%) 

H 


8. What 

£25 

_ 

A (X S 1 %) 

43 (91.49%) 

- 

47 

21.46% 

language do you 




48 (54.54%) 




use when you 

26-50 

20 

19(21.6%) 

16 (19.05%) 

1 (12.5%) 

88 

3.65% 

count numbers? 


(22.73%) 







£5! 


14 


“ 

84 

3X.35% 



54 

(16.67%) 


- 





(64.28%) 






Total of all age 



37 

m 


219 

100% 

groups 


(33.79%) 

(16.90%) 

■SIR 




9. What 

<.25 

_ 

1 (2.08%) 

47 (97.92%) 

- 

48 

22.64% 

language do you 


- 


61 (75.31%) 




mainly sing? 

26-50 

17 

2 (2.47%) 

30(36.14%) 

1 (1.23%) 

81 

38.21% 



(20.99%) 



- 




£51 

52 

1 (1.20%) 



83 

39.15% 



(62.65%) 






Total of all age 


69 

4 



212 

100% 

groups 


(3255%) 

(1.89%) 





Total 


216 

79 

351 

3 

649 


Means as 


33.28 

12.17 

'O' 

0.47 

100% 


percentages 
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The details of each question numbered above are presented 

below as follows: 

No. 7: for thinking: 

-37,73% of the young age group and 45.28% of the middle age 
group use Standard Thai. 

-73.97% of the senior group uses Vietnamese the most. 

-14.58% of the young group uses Isan the most and 2.08% uses 
Vietnamese the most for thinking. 

-22.35% of the middle age group uses Isan and 21.18% of them 
uses Vietnamese the most for thinking. 

-21.18% of the senior group uses Standard Thai the most for 
thinking and 14.12% of them uses Isan. 

No. 8: for counting numbers: 

-91.49% of the young age group and 54.54% of the middle age 
group uses Standard Thai the most. The young group 
(8.51%) uses Isan the most. They do not use Vietnamese. 

-22.73% of the middle age group uses Vietnamese the most and 
21.6% of them uses Isan. 

-64.28% of the senior group uses Vietnamese the most, 19.05% 
of them uses Standard Thai. 16.67% of them use Isan. 

No. 9: For singing: 

-97.92% of the young group and 75.31% of the middle age group 
uses Standard Thai the most. 

-62.65% of the senior uses Vietnamese. 

-2,08% of the young group uses Isan only and do not use 
Vietnamese for singing. 

-20.99% of the middle age group uses Vietnamese and 2.47% of 
them uses Isan the most. 

-36.14% of the senior group uses Standard Thai the most and 
1.20% of the senior group uses Isan. 
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Part 3: Language Attitude toward their Ethnicity 

This section will investigate language attitude toward their 
ethnicity with 8 statements. The findings are as follows: 

According to the study, 45.87% of the three age groups 
agree with all 8 questions which expresses a positive attitude 
toward their ethnicity and other related issues. 25.74% of them 
are neutral. 14.13% of them strongly agree. 14.26% negative. 

The percentages of language attitude toward their ethnicity 
are shown in the following table: 


Table 5: Percentages of language attitude toward their ethnicity 



Levels of attitude 

statement 



Agree 

Neutral 

Disagree 

Strongly 

Disagree 

Total 

l.Thai-Vietnamese 



18 

16 

1 

- 

A<J 

*T«I 

is a big group. 


(36.84%) 

(21.18%) 

(26.23%) 

(58.33%) 


(26.67%) 


26-50 

2 

23 

32 

- 

1 

58 



(10.53%) 

(27.06%) 

(52.46%) 

- 

(33.3%) 

(32.22%) 



10 

44 

13 

5 

2 

74 


£51 

(52.63%) 

(51.76%) 

(21.31%) 

(41.67%) 

(66.67%) 

(41.11%) 

Total 


19 

85 

61 

12 

3 

Kl 

Percentages 


(1055%) 

(47.22%) 

(33.89%) 

(6.67%) 

(1.67%) 

■a 

2.Thai-Vietnamese 

-t 

£25 

1 

9 

26 

10 

- 

i 

46 

people are rich. 


(20%) 

(15.79%) 

(28.26%) 

(29.41%) 


(23.96%) 


26-50 

2 

23 

32 

15 

2 

74 



(40%) 

(40.35%) 

(34.78%) 

(44.12%) 

(50%) 

(38.54%) 


£51 

2 

25 

34 

9 

2 

72 



(40%) 

(43.86%) 

(36.96%) 

(26.47%) 

(50%) 

(37.5%) 

Total 




92 

34 

4 

192 

Percentages 




(47.92%) 

(17.71%) 

(2.08%) 

(100%) 

3. Most of local 

£25 






43 

leaders are Thai. 


(41.67%) 

(23.36%) 




(22.63%) 


26-50 

23 

43 

3 

2 

3 

74 



(63.89%) 

(40.19%) 

(37.5%) 

(40%) 

(100%) 

(38.95%) 


£51 

29 

39 

3 

2 

- 

73 



(80.56%) 

(36.45%) 

(37.5%) 

(40%) 


(38.42%) 

Total 


36 

107 

8 

5 

3 

190 

Percentages 


(18.95%) 

(5632%) 

(4.21%) 

(2.63%) 

(158%) 

(100%) 

4, In the next 50 

£25 

3 

21 

23 

2 

- 

49 

years, there will be 


(56.60%) 

(30.43%) 

(31.51%) 

j (2.82%) 


(25.25%) 
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the people who 
speak Vietnamese. 

26-50 

£51 

5 

(9.43%) 

9 

(16.98%) 

30 

(43.48%) 

18 

(26.09%) 

28 

(38.36%) 

22 

(30.14%) 

10 

(29.41%) 

22 

(64.70%) 

1 

(100%) 

74 

(38.14%) 

71 

(36.60%) 

Total 

Percentages 


53 

(2732%) 

69 

(3557%) 

73 

(37.63%) 

34 

(1752%) 

1 

(052%) 

194 

5.Thai-Vietnamese 

£25 

8 

33 

6 


i 

48 

people prefer to 


(25%) 

(25%) 

(23.08%) 

- 

(100%) 

(24.49%) 

speak in 

26-50 

10 

48 

14 

3 

_ 

75 

Vietnamese. 

£51 

(31.25%) 

(36.36%) 

(53.85%) 

(60%) 

- 

(38.26%) 


14 

(43.75%) 

51 

(38.64%) 

6 

(23.08%) 

2 

(40%) 

- 

73 

(37.24%) 

Total 


32 

132 

26 

5 

I 

196 

Percentages 


(1633%) 

(6735%) 

(13.26%) 

(255%) 

(051%) 

(100%) 

6. The schools in 

£25 

4 

18 

11 

7 

5 

45 

your area 

26-50 

(22.22%) 

(23.68%) 

(22.44%) 

(21.87%) 

(41.67%) 

(24.06%) 

encourage 

5 

24 

24 

15 

4 

72 

children to speak 


(27.78%) 

(31.58%) 

(48.97%) 

(46.87) 

(33.33%) 

(38.50%) 

Vietnamese. 

£51 

9 

(50%) 

34 

(56.67%) 

14 

(28.57%) 

10 

(31.25%) 

3 

(25%) 

70 

(37.43%) 

Total 


18 

76 

49 

32 

12 

187 

Percentages 


9.62% 

40.64% 

26.20% 

17.11% 

6.42% 

100% 

7. The local 

£25 

5 

18 

14 

1 

2 

46 

authorities support 


(25%) 

(20%) 

(29.17%) 

(25%) 

(66.67%) 

1 

(24.34%) 

Vietnamese 

26-50 

6 

32 

21 

13 

73 

culture. 

£51 

(30%) 

(35.55%) 

(43.45%) 

46.43) 

(33.33%) 

(38.62%) 


9 

(45%) 

40 

44.44%) 

13 

(27.08%) 

8 

(28.57%) 

- 

70 

(37.04%) 

Total 


20 

90 

48 

28 

3 

189 

Percentages 


1038 

47.62 

25.40 

14.81 

159 

100 

8. The schools 

£25 

5 

20 

9 

9 

3 

46 

encourage the 


(13.89%) 

(25.32%) 

(27.27%) 

(13.33%) 

(33.33%) 

(24.60%) 

good attitude of 

26-50 

16 

23 

15 

15 

2 

71 

the language pride 


(44.44%) 

(29.11%) 

(26.92%) 

(23.81%) 

(22.22%) 

(37.87%) 

to the students. 

£51 

15 

(41.66%) 

36 

(45.57%) 

9 

(19,23%) 

6 

(9.52%) 

4 

(44.44%) 

70 

(37.43%) 

Total Percentages 


36 

19.25% 

79 

42.24% 

33 

17.65% 

63 

33.69% 

9 

4.81% 

187 

100% 

Total 


214 

695 

390 

180 

36 

1515 

Percentages 


14.13% 

45.87 

25.74 

11.88 

2.38 

1<K) 


According to the above statements with the related 
percentages, the findings of each statement are discussed as 
follows: 
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No. 1 Thai-Vietnamese is a big group: 

-47.22% of three groups agree. 

-33.89% of three groups are neutral. 

-10.55% of three groups strongly agree. 

-6.67% of the young and senior groups disagree. 

-1.67 % of three groups strongly disagree. 

No. 2 Thai-Vietnamese people are rich: 

-47.92% of the three age groups are neutral. 

-29.69% of the three age groups agree. 

-17.71% of the three age groups disagree. 

-2.61% of the three age groups strongly agree. 

-2.08% of the middle age and senior groups strongly disagree. 


Regarding the richness, majority of the Vietnamese people are 
business, so their economic status is quite well. But some are 
in the middle class, not rich. 

No. 3 Most of local leaders are Thai.: 

-56.32% of the three age groups agree. 

-18.95% of the three age groups strongly agree. 

-4.21% of the three age groups are neutral. 

-2.63% the three age groups disagree. 

-3% of the middle age group strongly disagrees. 

No. 4 In the next 50 years, there will be the people who speak 
Vietnamese.: 

-37.63% of the three age groups are neutral. 

-35.57% of the three age groups agree. 

-27.32% of the three age groups strongly agree. 

-17.52% of the three age groups disagree. 
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-0.52% of the three age groups strongly disagree. 

No. 5 Thai-Vietnamese people prefer to speak in Vietnamese.: 

-67.35% of the three age groups agree. 

-16.33% of the three age groups strongly agree. 

-13.26% of the three age groups are neutral. 

-2.55% of the middle and senior groups disagree. 

-0.51% of the young group strongly disagrees. 

No. 6 The schools in your area encourage the children speak 
Vietnamese.: 

-40.64% of three age groups agree. 

-26.20% of the three age groups are neutral. 

-17.11% of the three age groups disagree. 

-9.62% of the three age groups strongly agree. 

-6.42% of the three age groups strongly disagree. 

No. 7 The local authorities support the Vietnamese culture.: 

-47.62% of the three age groups agree. 

-25.40% of the three age groups are neutral. 

-14.81% of the three age groups disagree. 

-10.58% the three age groups strongly agree. 

-1.59% the three age groups strongly disagree. 

No. 8 The schools encourage the good attitude of the language 
pride to the students: 

-42.24% of three age groups agree. 

-33.69% of three age groups disagree. 

-19.25% of three age groups strongly disagree. 

-17.65% of three age groups are neutral. 

-4.81% of three age groups strongly disagree. 
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Regarding the results of No. 6-8, currently the socio-political 
situation and relations between Thailand and Vietnam are very 
good. Thai-Vietnamese people dare to open themselves to 
outsiders. Their social status is accepted. Therefore, many 
schools in many areas encourage students, both Thai and Thai- 
Vietnamese, to learn more Vietnamese as an elective subject 
and to take part in cultural activities. But some schools do not 
link with the Thai-Vietnamese community and do not 
encourage students to learn more about the Vietnamese 
language. Therefore, attitudes vary from place to place. 

Part 4: Attitudes toward the Vietnamese Language and 
Identity 

This section will investigate how the Thai-Vietnamese 
respondents express their attitude towards their language and 
themselves in response to 20 statements. The general result is 
as follows: 

-49.32% of the three age groups agree with the 20 statements. 

-21.82% of the three age groups strongly agree with the 20 
statements. 

-15.40% of the three age group is neutral for these 20 statements. 

-10.71% of the three age group disagree with the 20 statements.. 

-2.78% of the three age group strongly disagree with the 20 
statements. 

It is noticeable that the negative attitude percentages toward 
their own language and identity are lower than the positive 
attitude percentages. 

Table 6 Percentages of attitude towards their language and identity 
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Total 

Means 

percentage 


2. Vietnamese 
language has a 
specific 
characteristics 

<.25 

26-50 

1>51 

ii ! 

(20.37%) 

14 

(25.92%) 

29 

(53.70%) 

21 

(33.33%) 

19 

(23.46%) 

45 

(55.56%) 

9 

(47.37%) 

10 

(52.63%) 

Total 


54 

91 

19 

Means of 

percentage 

! 

(32.73%) 

(55.15%) 

(1151%) 


3. Use <25 

Vietnamese 

language to the 26-50 

children is the 

best, ^51 


(19.44%) (21.05%) 

16 27 

(44.44%) (28,42%) 

13 48 

(36.11%) (50.53%) 


12 

(30%) 

17 

(42.5%) 

11 

(27.5%) 


1 

_ 

165 

(0.61%) 


(100%) 


Total 

Means of 

percentage 


4. The best way 
to maintain 
Vietnamese 
language is 
using only 
Vietnamese in 
the family. 


Total 

Means of 

percentage 


5. You think ^25 

that Vietnamese 
is important in 26-50 

your daily life. 


36 95 

(20.22%) (53.37%) 



4 

(7.55%) 

18 


(25.53%) 

43 


(33.97%) (45.74%) 

31 27 

(58.49%) (28.72%) 


53 94 

(27.46%) (48.70%) 


(7.40%) (25.61%) 

12 29 

(44.44%) (35.36%) 

13 32 

(48.15%) (39.02%) 


40 

(22.47%) 



(31.03%) 

9 

(31.03%) 

11 

(37.93%) 


16 

(26.23%) 

26 

(42.62%) 

19 

(31.15%) 


(52.94%) 

6 

(35.29%) 

2 

(11.76%) 


29 

17 

(15.02%) 

(8.81%) 


Total 


27 


82 
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Total 

Means 

percentage 



83 

64 

20 

(43.23%) 

(3333%) 

(10.42%) 
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11. You will 

£25 


9 

10 

- 21 

8 

48 

fed betler if you 


- 

(34.62%) 

(45.45%) 

(25.61%) 

(40%) 


tell anybody that 

26-50 

1 

14 

13 

40 

7 

75 

you are a Thai. 


(50%) 

(53.85%) 

(59.09%) 

(48.78%) 

(35%) 



£51 

1 

3 

9 

21 

5 

39 



(50%) 

(11.54%) 

(40.90%) 

(25.61%) 

(25%) 


Total 


2 

26 

32 

82 

20 

162 

Means of 


(1.23%) 

(16.05%) 

(1975%) 

(50.62%) 

(1235%) 


percentage 








12. In the future. 

<25 

3 

22 

11 

9 

1 

46 

the numbers of 


(27.27%) 

(19.47%) 

(30.56%) 

(34.62%) 

(20%) 


the Vietnamese 

26-50 

7 

43 

13 

7 

2 

72 

speakers reduce. 


(63.64%) 

(38.05%) 

(36.11%) 

(26.62%) 

(4(1%) 



£51 

1 

48 

12 

10 

2 

73 



(9.09%) 

(42.48%) 

(33.33%) 

(38.46%) 

(40%) 


Total 


11 

113 

36 

26 

5 

191 

Means of 


(576%) 

(59.16%) 

(18.85%) 

(13.61%) 

(2.62%) 


percentage 








13. At present, 

<125 

3 

32 

9 

2 

1 

47 

the numbers of 


(15%) 

(23.70%) 

(36%) 

(12.5%) 

(50%) 


the Vietnamese 

26-50 

7 

54 

7 

6 

1 

75 

speakers reduce. 


(35%) 

(40%) 

(28%) 

(37.5%) 

(50%) 



£51 

10 

49 

9 

X 

" 

76 



(50%) 

(36.3(1%) 

(36%) 

(50%) 



Total 



135 

25 

16 

2 

198 

Means of 


(10.10%) 

(68.18%) 

(12.63%) 

(8.08%) 

(1.01%) 


percentage 








14. You are shy 

<125 

_ 

2 

1 

20 

16 

45 

to speak 



(28.57%) 

(50%) 

(22.73%) 

(31.37%) 


Vietnamese. 

26-50 

1 

5 

7 

43 

27 

83 



(100%) 

(71.43%) 

(50%) 

(48.86%) 

52.94%) 



£51 

- 

- 

- 

25 

8 

33 






(28.41%) 

(15.69%) 


Total 


1 

7 

14 

88 

51 

161 

Means of 


(0.62%) 

(435%) 

(8.7%) 

(54.66%) 

(31.68%) 


percentage 








15. You think 

£25 

12 

21 

3 

5 

- 

41 

that the local 


(1.67%) 

(23.33%) 

(21.43%) 

(16.67%) 



schools should 

26-50 

23 

35 

8 

2 

- 

68 

provide the 


(34.33%) 

(38,89%) 

(57.12%) 

(6.67%) 



elective 

£51 

32 

34 

3 

23 

17 

109 

Vietnamese 


(47.76%) 

(37.78%) 

(21.43%) 

(76.67%) 

(100%) 


subject. 








Total 


67 

90 

14 

30 

17 

218 

Means of 


(30.73%) 

(41.28%) 

(6.42%) 

(13.76%) 

(7.80%) 


percentage 








16. You are 

£25 

17 

24 

5 

- 


Ml 

proud of your 


(18.09%) 

(27.27%) 

(38.46%) 



\mm 

language. 

26-50 

32 

35 

8 

1 


i 
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£51 

(34.04%) 

45 

(47.87%) 

(39.77%) 

29 

(32.95%) 

(61.54%) 

(50%) 

1 

(50%) 


75 

Total 


94 

88 

13 

2 

- 

197 

Means of 


(47.71%) 

(44.67%) 

(6.60%) 

(1.02%) 



percentage 








17. You have a 

£25 

1 

25 

12 

6 

1 

45 

chance to watch 


(2.70%) 

(25%) 

(35.30%) 

(42.86%) 

(33.33%) 


the Cable TV 

26-50 

13 

32 

17 

8 

2 

72 

Program from 


(35.13%) 

(32%) 

(50%) 

(57.12%) 

(66.67%) 


Vietnam 

£51 

23 

43 

5 

- 

“ 

71 

regularly. 


(62.16%) 

(43%) 

(14.71%) 




Total 


37 

100 

34 

14 

3 

188 

Means of 


(19.68%) 

(53.19%) 

(18.08%) 

(7.45%) 

(1.60%) 


percentage 








18. It’s a pity 

£25 

11 

24 

9 

- 

- 

44 

that the 


(18.64%) 

(22.64%) 

(64.28%) 




Vietnamese 

26-5(1 

22 

41 

4 

4 

1 

72 

children do not 


(7.29%) 

(38.68%) 

(28.57%) 


(100%) 


speak 

£51 

26 

41 

1 


- 

69 

Vietnamese. 


(44.07%) 

(38.68%) 





Total 


59 

106 

14 

5 

1 

185 

Means of 


(3190%) 

(5730%) 

(737%) 

(2.70%) 

(034%) 


percentage 








19. Use 

£25 

8 

32 

8 

- 

- 

48 

Vietnamese in 


(16.33%) 

(26.23%) 

(34.78%) 




grotto gives the 

26-50 

21 

44 

10 

~ 

~ 

75 

sense of being 


(42.86%) 

(36.06%) 

(43.49%) 




the same group. 

£51 

20 

46 

5 

1 

1 

73 



(40.82%) 

(37.70%) 

(21.74%) 

(100%) 

(100%) 


Total 


49 

122 

23 

1 

1 

196 

Means of 


(25%) 

(62.24%) 

(11.73%) 

(031%) 

(031%) 


percentage 








20. There 

£25 

2 

21 

12 

8 

- 

43 

should be the 


(9.52%) 

(21.65%) 

(35.29%) 

(29.63%) 



Vietnamese 

26-50 

14 

30 

16 

10 

- 

70 

broadcasting or 


(66.67%) 

(30.93%) 

(47.06%) 

(37.04%) 



publications in 

£51 

5 

46 

6 

9 

1 

67 

the community. 


(23.81%) 

(47.42%) 

(17.65%) 

(33.33%) 

(100%) 


Total 


21 

97 

34 

27 

1 

180 

Means of percentage 

(11.67%) 

(53.89%) 

(18.89%) 

(15%) 

(035%) 


Total 

814 

1842 

575 

400 

104 

3735 

Means of percentage 

21.80 

4932 

15.40 

10.71 

2.78 

100 


The details of each statement are presented as follows: 
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No. 1: They are proud to be born Vietnamese 
-93% of the three groups strongly agree. 

-43.50% of them agree, 

-4% of the young middle age groups are neutral. 

-1.69% of them disagree. 

The majority agrees or strongly agrees that they are proud to 
be Vietnamese born. But for some middle age and young 
people, about 3.94% are neutral or disagree. They may have 
had some political problems or other difficulties in Thai 
society in the past, therefore, perhaps they recent being bom 
Vietnamese. They want to hide their identity. 

No. 2: Vietnamese language has specific characteristics. 

-55.15% of the three groups agree. 32.73% of them strongly 
agree. 

-11.51% of the young and middle age groups are neutral. 0.61% 
of the middle age disagrees. 

-55.56% and 53.70% of the senior group agree and strongly 
agree. 

-No other attitude. 

No. 3: Use of Vietnamese language with children is the best. 

-53.37% of the three groups agree. 

-22.47% of them are neutral. 

-20.22% of them strongly agree. 

-3.93% of the middle age and senior group disagree. 

Regarding using Vietnamese language with their children, 
some Vietnamese families succeed well. Some did not do so in 
the past but encourage their children to learn more in extra 
Vietnamese classes privately or in schools. Some are not aware 
of this and use only Standard Thai or Isan, like local Thai 
people. 
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No. 4: The best way to maintain Vietnamese language is using only 
Vietnamese in the family. 

-48.70% of the three groups agree. 

-27.46% of them strongly agree. 

-15.02% of them are neutral. 

-8.81% of them disagree: 52.94% of the young group, 35.29% of 
the middle age and 11.76% of the senior disagree. 

The young and middle age groups disagree the most will use 
only Vietnamese in the family because their parents are 
bilingual so they may not see the need for using only 
Vietnamese in the family. Moreover if they have not been 
taught Vietnamese since childhood, they are unlikely to be able 
to use Vietnamese fluently with family members. 

No. 5: You think that Vietnamese is important in your daily life. 

-42.93% of three groups agree, 

-31.94% of them are neutral. 

-14.14% of them strongly agree. 

-10.47% of them disagree. 

-0.52% of the middle age group disagrees strongly which is very 
few compared with the agreement percentages. 

No. 6: The local government should support Vietnamese language 
and culture. 

-56.02% of the three age groups agree. 

-31.94% of the three age groups strongly agree. 

-9.95% of the three age groups are neutral. 

-2.09% of the young and senior groups disagree. 

No. 7: The less Vietnamese language use, the more Vietnamese 
identity reduction. 

-54.69% of the three groups agree. 

-27.60% of the three groups strongly agree. 

-10.42% of the three groups are neutral. 
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-6.25% of the three groups disagree. 

-1.04% of the young and middle age groups disagree. 

For this point, majority does agree that Vietnamese language is 
the identity of their ethnicity. 

No. 8: You think that Vietnamese language is easy to speak. 

-61.58% of the three groups agree, 

-15.25% of the three groups strongly agree. 

-14.69% of the three groups are neutral. 

-8.47% of the three groups disagree. 

No. 9: You think that the Vietnamese scripts are not difficult. 
-56.65% of the three groups agree. 

-26.60% of the three groups are neutral. 

-11.82% of the three groups strongly agree. 

-4.93% of the three groups disagree. 

No. 10: You think that Vietnamese language is useful for you. 

-43.23% of the three groups agree. 

-33.33% of the three groups are neutral. 

-13.02% of the three groups strongly agree. 

-10.42% of the three groups disagree. 

No. 11: You will feel better if you tell anybody that you are a Thai. 

-50.62% of the three groups disagree. 

-19.75% of the three groups are neutral. 

-16.05% of the three groups agree. 

-12.35% of the three groups strongly disagree. 

-1.23% of the middle age and the senior groups strongly agree. 

The majority of the respondents disagreed to hiding their 
Vietnamese identity. But some people may have suffered from 
the authorities in the past, therefore they are happy to integrate 
into the Thai majority and become Thai. 
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No. 12: In the future, the numbers of the Vietnamese speakers will 
decline. 

-59.16% of the three groups agree. 

-18.85% of the three groups are neutral. 

-13.61% of the three groups disagree. 

-5.76% of the three groups strongly agree. 

Higher percentages of agreement show that people have 
noticed that the younger generation of Thai-Vietnamese use 
Vietnamese less. Thai societal influences affect their language 
and identity, inevitably. 

No. 13: At present, the numbers of the Vietnamese speakers is 
declining. 


-68.18% of the three groups agree. 

1 O rxT tTn 0 fVvroo rrrAiine Qi’P 

- 1L,UJ /(j U1 Li 1 V_/ LI 11 £.1 U. 1 W uvwtiwii. 

-10.10% of the three groups strongly agree. 

-8.08% of the three groups disagree. 

-1.01% of the young and middle age groups strongly disagree. 

These results are quite similar to the statement above. The 
highest percentages of agreement, especially the middle age 
on/4 rtmtmc ctirvw tVint fhp numbers of the Vietnamese 

cum OV/lilUl wUp'O U11V m -— — - .... 

speakers will decline in the future. A small percentage 
disagrees. 

No. 14: You are shy to speak Vietnamese. 


-54.66% of the three groups disagree. 

-31.68% of the three groups strongly disagree 
-8.7% of the three groups are neutral. 

-4.35% of the young and middle age groups agree. 

-0.62% of the middle age group is strongly agree. 

The highest percentages show that the Thai-Vietnamese people 
are not shy to speak Vietnamese. Only a few of the young and 
middle age groups are shy to do so. 
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No. 15: You think that the local schools should provide an 
Vietnamese language elective. 

-41.28% of the three groups agree. 

-30.73% of the three groups strongly agree. 

-13.76% of the three groups disagree. 

-7.80% of the senior group is strongly disagree. 

-6.40% of the three groups are neutral. 

No. 16: You are proud of your language. 

-47.71% of the three groups strongly agree. 

-44.67% of the three groups agree. 

-6.60% of the young and middle age groups are neutral, 

-1.02% of the middle age and senior groups disagree. 

The majority of the three groups are proud of their language. 
Only a few disagree. 

No. 17: You have a chance to watch the Cable TV Program from 
Vietnam regularly. 

-53.19% of the three groups agree. 

-19.68% of the three groups strongly agree. 

-18.08% of the three groups are neutral. 

-7.45% of the senior and middle age groups disagree. 

-1.60 7o of the senior and middle age groups strongly disagree. 

No. 18: It’s a pity that Vietnamese children do not speak Vietnamese. 

-57.30% of the three groups agree. 

-31.90% of the three groups strongly agree. 

-7.57% of the three groups are neutral. 

-2.70% of the middle age and the senior disagree. 

-0.54% of the middle age strongly disagree. 

No. 19. Use of Vietnamese in a group gives a sense of belonging to 
the same group. 

-62.24% of the three groups agree. 
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- 25 % of the three groups strongly agree. 

-11.73% of the three groups are neutral. 

-0.51% of the senior group disagree and strongly disagree. 

No. 20: There should be Vietnamese broadcasting or publications in 
the community. 

-53.89% of the three groups agree. 

-18.89% of the three groups are neutral. 

-15% of the three groups disagree. 

-11.67% of the three groups strongly agree. 

-0.55% of the senior strongly disagree. 

Part 5: Language Use ability 

There are 3 questions asking for language use ability of the 
respondents as follows: 

1) What languages can you speak? 

2) What was the first language of your childhood? 

3) What language are you most fluent in? 

The findings are as follows: 

-37.41% of the three groups can use Standard Thai. 

-35.61% of the three groups can use Vietnamese. 

-24.34% of the three groups can use Isan. 

-2.65% of the three groups can use other languages: English and 
French. 

According to these findings, the Thai-Vietnamese people are 
multilingual as shown in the table (see next page). 
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Table 7 Language use ability 


Questions 

Age 

Vietnamese 

Isan 

Standard Thai 

English/ 

French 

Total 

1. What 

£25 

25 

29 

45 

9 

108 

languages tan 


(14.80%) 

(17.06%) 

(23.81%) 

(33.33%) 

(19.46%) 

you speak? 

26-5(1 

64 

7(1 

73 

16 

223 



(37.87%) 

(41.18% 

(38.62%) 

(59.26%) 

(40.18%) 


>51 

80 

71 

71 

2 

224 



(47.34%) 

(41.76%) 

(37.57%) 

(7.41%) 

(40.36%) 

Total 


169 

170 

189 

27 

555 

Meatus of 


(30.45%) 

(30.63%) 

(34.05%) 

(4.86%) 


percentage 







2. What the first 

£25 

8 

5 

40 

1 

54 

hsnmiage of your 


(6.30%) 

(20.83%) 

(50%) 

(100%) 

(23.27%) 

childhood? 

26-50 

47 

12 

29 

- 

88 



(37.01%) 

(50%) 

(36.25%) 


(37.93%) 


£51 

72 

7 

11 

- 

90 



(56.70%) 

(20.17%) 

(13.75%) 


(38.80%) 

Total 


127 

24 

80 

I 

232 

Means of 


(5474%) 

(1034%) 

(34.48%) 

(0.43%) 


percentage 







3. What is your 

£25 

2 

6 

43 


51 

most fluent 


(2.5%) 

(9.52%) 

(34.13%) 


(18.96%) 

language? 

26-50 

16 

26 

58 


1(H) 



(20%) 

(41.27%) 

(46.03%) 


(37.17%) 


>51 

62 

31 

25 


118 



(77.5%) 

(49.21%) 

(19.84%) 


(43.87%) 

Total Means of 


50 

63 

126 


269 

percentage 


(2974%) 

(23.42%) 

(46.84%) 



Total 

376 

257 

395 

28 

1056 

Means of percentage 

35.61 

2434 

37.41 

2.65 

100 


The details of the findings in the table are presented as follows: 


Nol: What languages can you speak? 

-34.05% of the three age groups can speak Standard Thai. 

-30.63% of the three age groups can speak Isan. 

-30.45% of the three age groups can speak Vietnamese. 

-4.86% of the three age groups can speak English and French. 

No 2: What was the first language of your childhood? 

-54.74% of the three age groups speak Vietnamese as the first 
language. 
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-34,48% of the three age groups speak Standard Thai as the first 
language. 

-10.34% of the three age groups speak Isan as the first language. 

-0.43% of the young group speak other language (not identify) as 
the first language. 

No 3: What language are you most fluent in? 

- 46 . 84 % of the three age groups speak Standard Thai the most 
fluently. 

-29.74% of the three age groups speak Vietnamese the most 
fluently. 

-23.42% of the three age groups speak Isan the most fluently. 

Personal language skill percentages of the three age groups, 
the summarized findings are as follows: 

The respondents are good in Vietnamese listening and 
speaking, but the young and middle age groups cannot 
understand, speak or read Vietnamese. For writing, some of all 
age groups cannot write Vietnamese, The details are as 
follows: 

No. I. Vietnamese listening skill. 

-34.15% of the three age groups are good. 

-28.78% of the three age groups are fair. 

-21.95% of the three age groups are excellent. 

-12.19% of the young and middle age groups are not good. 

-2.93% of the young and middle age groups cannot understand 
when listening to Vietnamese language. 

No.2. Vietnamese speaking skill 

-33.33% of the three age groups are fair in speaking Vietnamese. 

-32.83% of the three age groups are good in speaking 
Vietnamese. 
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-16.91% of the three age groups are excellent in speaking 
Vietnamese. 

-11.94% of the three age groups are not good in speaking 
Vietnamese. 

-4.98% of the young and middle age groups are not able to speak 
Vietnamese. 

No. 3. Vietnamese reading skill 

-25.36% of the young and middle age groups are not able to 
read Vietnamese. 

-23.41% of the three age groups are fair in reading Vietnamese. 

-20% of the three age groups are good in reading Vietnamese. 

-17.56% of the three age groups are excellent in reading 
Vietnamese. 

-13.66% of the three age groups are not good in reading 
Vietnamese. 

No. 4. Vietnamese writing skill 

-29.85% of the three age groups are not able to write Vietnamese. 

-22.39% of the three age groups are fair in writing Vietnamese. 

-17.91% of the three age groups are good in writing Vietnamese. 

-16.91% of the three age groups are excellent in writing 
Vietnamese. 

-12.93% of the three age groups are not good in writing 
Vietnamese. 

For the four language skills, the middle age and young groups 

(33.27%) are not able to listen, speak and read Vietnamese. 

No. 5. Thai listening skill 

-40.40% of the three age groups are good in listening to Thai. 

-37.93% of the three age groups are excellent in listening to Thai. 

-17.24% of the three age groups are fair in listening to Thai. 
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-4.43% of the middle and senior groups are not good in listening 
to Thai. 

No. 6. Thai speaking skill 

-37.93% of the three age groups are excellent in Thai speaking. 

-36.94% of the three age groups are good in Thai speaking. 

-23.15% of the three age groups are fair in Thai speaking. 

- 4 . 43 % of the middle and senior groups are not good in Thai 
speaking. 

No. 7. Thai reading skill 

-33.82% of the three age groups are good in Thai reading. 

-32.84% of the three age groups are excellent in Thai reading. 

-20% of the three age groups are fair in Thai reading. 

-10.29% of the middle age and senior groups are not good in 
Thai reading. 

-2.94% of the middle age and senior groups cannot read Thai. 

No. 8. Thai writing skill 

-28.92% of the three age groups are excellent in Thai writing. 

-25.49% of the three age groups are good in Thai writing. 

-22.55% of the three age groups are fair in Thai writing. 

-12.25% of the middle age and senior groups are not able to 
write Thai. 

-10.78% of the three age groups are not good in Thai writing. 
For Thai language, 12.25% of the middle age and senior 
groups are not able to write Thai. It is possible that they 
were not educated in the Thai school system. 

Table 9 Personal language skills percentages 
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Total 
Means of 
percentage 


4 You can 
write 

Vietnamese. 



41 

(20%) 

48 

(23.41%) 

28 

(13.66%) 

2 

5 

9 

(5.56%) 

(11.11%) 

(34.62%) 

10 

20 

9 

(27.78%) 

(44.44%) 

(34.62%) 

24 

20 

8 

(66,67%) 

(44.45%) 

(30.77%) 


10 

(4.98%) 

201 

27 

48 

(51.92%) 

- 

25 

77 

(48.08%) 

80 



52 

(25.36%) 

205 
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6 You can 

225 

2X 

17 

3 

. 

. 

48 

speak Thai. 


(38.89%) 

(22.67%) 

(6.38%) 



- 


26-50 

32 

30 

13 

2 


77 



(44,44%) 

(40%) 

(27.66%) 

(22.22%) 


_ 


551 

12 

28 

31 

7 


78 



(16.67%) 

(37.33%) 

(65.96%) 

(77.78%) 



Total 


77 

75 

47 

9 


203 

Means of 
percentage 


(3793%) 

(36.94%) 

(23.15%) 

(4.43%) 



7 You can 

225 

28 

17 

4 

_ 

_ 

49 

read Thai. 


(41.79%) 

(24.64%) 

(9.76%) 



- 


26-50 

34 

25 

13 

2 

1 

75 



(50.75%) 

(36.23%) 

31.71%) 

(9.52%) 

(16.67%) 

- 


251 

5 

27 

24 

19 

5 

80 



(7.46%) 

(39.13%) 

(58.54%) 

(90.48%) 

(83.33%) 


Total 


67 

69 

41 

21 

6 

fSB 

Means of 
percentage 


(32.84%) 

(33.82%) 

(20%) 

(10.29%) 

(2.94%) 

■ 

8 You can 

225 

26 

17 

4 

1 

. 

48 

write 


(44.07%) 

(32.70%) 

(8.70%) 

(4.54%) 


- 

Thai. 

26-50 

31 

22 

18 

5 

1 

77 



(52.54%) 

(42.31%) 

(9.13%) 

(22.73%) 

(4%) 

- 


251 

2 

13 

24 

16 

24 

79 



(3.39%) 

(25%) 

(52.17%) 

(72.72%) 

(96%) 


Total 


59 

52 

46 

22 

25 

204 

Means of 
percentage 


(28.92%) 

(25.49%) 

(2255%) 

(1078%) 

(12.25%) 



Summary 


According to the study, the Vietnamese language is used the 
most inside the family followed by Standard Thai and Isan. 
The senior group uses more Vietnamese will all relatives but 
more Standard Thai and Isan to same-age and junior relatives. 
The middle age group use Vietnamese with their senior 
relatives and Standard Thai with juniors. The young generation 
use Standard Thai with all relatives. The use of Vietnamese 
inside the family is declining. 

Vietnamese language is used most with non-family 
Vietnamese people by the senior and middle age groups. The 
young group uses Standard Thai to all circumstances. 
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The middle age group uses Standard Thai and Isan with 
Non-Vietnamese, but the senior group uses Isan the most with 
non-Vietnamese. 

More than 60% of the respondents have a positive attitude 
toward their ethnicity, language and identity. This is a good 
sign that they are still proud of their roots. Some middle age 
people in the provinces are aware and keen to maintain their 
language and culture. Although the young generation uses 
Vietnamese little or not at all, if families and communities help 
each other by providing Vietnamese courses privately or 
formally in schools, it will encourage them to learn their 
ancestral language. Moreover, if the young generation of Thai- 
Vietnamese can study Vietnamese at university, this another 
way to revitalize and maintain the language. If all sectors of 
the community understand the importance of the diversity of 
language and culture, they will preserve and maintain their 
traditions effectively in the face of globalization. Otherwise, 
they will gradually be lost. 
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PROBING METAPHORICAL 
UNDERSTANDING OF CHIEFDOM IN 

SANENYO 1 

S Winston Cruz 

Central Institute of Indian Languages, Mysore 

ABSTRACT 

Chiefdom as a system of social organization, 
simple or complex, is considered to be transitional. 
This is irrespective of whether one supports the 
evolutionary theory of political anthropology or 
believes in the structuralist-functionalist approach. 
Chowra Island of the Andaman and Nicobar 
archipelago has a socio-political system where some 
of the markers of chiefdom appear as integral 
characteristics. Reasons may differ from community 
to community on why they agree to enter into such a 
socio-political system. As the reasons for the 
innovation of such systems differ, so do the native 
understanding of the nature and structure of each 
system. The interplay between language, cognition 
and environment is many a times revealed in the 
metaphors that are used to describe the system in 
the native language. The metaphors may reflect the 
complexity. They may also reflect the attitude of the 
community about the system and play a role in the 
passing of this understanding to the next generation. 
The present paper describes the socio-political 
system of the Nicobarese of Chowra in detail and 
does a bare investigation of the metaphors used in 
the native language Sanenyo to describe the units 
and participants. 


1 Data for the present study was collected as part of the Project on the Cultural 
Documentation of the Nicobarese, undertaken by the Central Institute of Indian Languages, 
Mysore in collaboration with the Directorate of Sports, Arts and Culture, Andaman and 
Nicobar Administration, Port Blair. 
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1. Introduction 

In their influential book, Metaphors We Live by, Lakoff and 
Johnson (1980) advanced the idea that the way we think, what 
we experience and what we do everyday is all very much a 
matter of metaphor. While metaphors are essentially 
comparisons, there is the tension between the objects in the 
trope, and an accompanying pedagogic purpose that remained 
to be explained (Ortony 1975). Understanding and describing 
metaphors becomes more complex in social anthropology, in 
the study of ethnic systems, where the task is not only 
describing the systems and its referent metaphors but also 
explaining why the system came into existence in the first 
place (Leach 1976). In this paper, we will first study the pre¬ 
tsunami indigenous socio-political organization that existed at 
Chowra Island 2 , Andaman and Nicobar archipelago, India. The 
depiction of the organization will concentrate on the status and 
function of the various units, the economic transaction between 
the units, their members etc. Comparing and contrasting with 
existing literature on the socio-political organization of 
Chowra has been adopted in this paper as a preferred 
technique for advancing the academic understanding of the 
indigenous organization. The classification of the organization 
as a chiefdom and the supporting factors or their lack will be 
analyzed further. Investigation of the titles of the units and its 
members is next. Studying the metaphorical implication of 
these terms actually turns the present enterprise into rationalist 
from empiricist in the field of political anthropology. 

2. Chowra Island Statistics 

Chowra, also spelt as ‘Chaura’, is considered the cultural 
capital of Nicobars by many ethnologists and anthropologists. 
Except for a few government officials posted on short term 


2 Chowra Island, one of the worst affected in the tsunami of December 2004, had to be 
evacuated completely after the disaster. The people stayed in tents (refugee camps) in the 
neighboring Teressa Island for about two years before returning to Chowra. 
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basis, Chowra boasts of a 1376 strong mono-ethnic mongoloid 
community. While Nicobarese communities of Central and 
Southern groups were suspected for a mixture of Malay, 
Chinese, Pre-Dravidian and even Indo-European blood, the 
Chowrites were considered an exception and as a purest type 
(Bonnington. 1931). Generally believed to be of short stature, 
they are strongly built, long head, and mesocephallic people 
(Prasad, B.V.R. and B. B. Rao. 2004). 

Sanenyo is the native tongue of the people of Chowra. Most 
of them are competent in Luro, Pu and Mout too. Quite a few 
of them can speak Hindi and English too. 

Chowra is an isolated island in the Central Group of 
Nicobar archipelago. It is 8.2 sq. km and is surrounded by a 
coral reef. The sea is open on all sides. There are no rivers, 
inlets, or noticeable bays. It is an entirely flat island except for 
the tdheup, a hill of about 340 feet at the southern tip. Among 
the vast spreads of coconut farms, there are insets of 
grasslands and agricultural farms. The coconut farms are 
interspersed with trees and plants of various kinds that make 
them more of a forest than mere coconut groves. Inedible 
pandanus is seen along the shore. The flora, according to the 
islanders, also includes plants with medicinal values. There are 
no wild animals at Chowra. Pigs, dogs, cats, chicken, and 
pigeons are among the list of domesticated animals. Presence 
of rodents and mice are reported. Pigs and domesticated boars 
are reared both in villages and in coconut groves. Fowls of the 
forest include parakeet and pigeons. Octopuses and various 
kinds of fish are available from the sea. Fish are sparse near 
the shore because of the openness of the sea. For the same 
reason, there is considerable absence of reef fish too. 

Chowra is the most densely populated among all the islands 
of Nicobars. The community has settled itself in five villages, 
viz., Raiheon, Kuitasuk, Chongamong, Alheat and Ta-eela, 
located near the shore but not on the shore. Some vegetation is 
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found separating the village from the shore. Villages at 
Chowra expand continuously from the east to the north side of 
the island. Villages are marked with mud beaten to a solid 
ground. 

Au panam is the place where each party in the village has 
two community huts. It borders Chongamong and the fourth 
village, Alheat. Au panam is located facing the sea. The church, 
death house, delivery house, party community houses, and the 
cemetery form a complex. 

3. Tribal Councils 
3.1. Introduction 

The community at Chowra has a socio-political system of its 
own. This organization of the community reflects constant 
introspection and feedback over the years. There are two 
councils operating in parallel under a single Chief. One is 
indigenously developed and the other is a recent off-shoot of 
the Panchayat Raj Institution of Government of India. Both 
councils interact with each other at different levels in different 
degrees. 

The socio-political organization at Chowra has been 
studied only with partial success by researchers. Following 
summarization of relevant parts in two of the studies, Reddy 
and Sudersen (1986) and Justin (2003), dealing mainly with 
the tuhet system at neighboring Car Nicobar Island, can lay the 
base for our discussion. 

In Reddy and Sudersen (1986), kinem is the unit with a 
socio economic status at Car Nicobar. Kinem is a cognate 
descent group consisting of collaterals of both sexes, including 
their spouses. It is lineally neither agnate nor uterine. The 
kinem, not the family, is the primary corporate group at Car 
Nicobar. 
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Kinem has a head who is usually the senior of the male line. 
Above the kinem is the Village Council, constituted of heads of 
all the kinem in the village. This Council elects the Chief of the 
Village Council. The Island Council is the supreme body at 
Car Nicobar where all the Village Captains are members. The 
Island Council is headed by a Chief and a Vice Chief Captain. 
By tradition, the Chief Captainship goes to a member from Seti 
village. 

It can be derived from the study that the Chief of the 
Village Council is a head of one of the kinem in the village. 
What is obscure is the fact whether the Chief of the Village 
Council is the Village Captain or not. The authors also mention 
about the introduction of adult franchise in 1978. By the time 
the study was published, adult franchise had become the norm. 
The islanders elect three captains per village for five years 
through a secret ballot. However, the tuhet system described in 
the study survived. 

According to Reddy and Sudersen (1986), a similar unit to 
the kinem of Car Nicobar, which traces to a common 
(unknown) ancestor, is the kunyee at Chowra. Kunyee too is 
exogamous. A kunyee is constituted of more than one 
yomkanala. A yomkanala is an agnate kin group. Yomkanala 
are named after its senior most, living male member and the 
kunyee are named after the head of their senior yomkanala. 

Both the yomkanala and the kunyee are corporate groups at 
different levels. The head of the kunyee , kavee kunyee, is 
responsible for the distribution and allocation of the farm and 
forest land, according to the study. Three captains are 
nominated by the island chiefs for each village. While there is 
no Village Council at Chowra, the village captains are part of 
the Island Council, kavayee, along with the three island chiefs. 
The island chiefs are three kavee kunyee who are ranked first, 
second and third, and are addressed Chief, Second Chief, and 
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Third Chief respectively according to this study. The 
introduction of the term ‘captain’ is a colonial influence. 

Though a valuable contribution, Reddy and Sudersen 
(1986) stays partial in its attempt to describe institution of 
captainship at Chowra. The reason is that it does not separate 
the indigenously developed system and the parallel village 
council system developed to serve PRI Scheme of the 
Government. Reddy and Sudersen (1986) is the outcome of 
fieldworks undertaken in 1981 and 1971 at Car Nicobar and 
Chowra respectively. 

Justin (2003) informs that the three terms, ldnem, mirooto, and 
tuhet, are synonymous in the common parlance at Car Nicobar. 
A tuhet indicates the large extended joint family, that is, 
maximal lineage members identified under a particular name. 
A tuhet is exogamous with the family being patrilineal. The 
residence pattern can be either patrilocal or matrilocal. Heads 
of the tuhet are selected considering criterion of age, 
proficiency, profound knowledge of the customary laws and 
qualities of administrative leadership over a large lineage 
group. The tuhet heads hold their office for a life time. 
Successors are nominated, as a rule, on primogenitary basis. 

While this is the general picture, kinem or mirooto, if they 
were to be explained specifically, are the outcome of a tuhet, 
where a group separates itself from the parental tuhet into 
several tuhets , according to Justin (2003). The affiliation of 
tuhets and kinem or mirooto witnessed at Car Nicobar was not 
very discernible in the Central and Southern Nicobars 
according to Justin (1990:49), but has become discernible at 
least on Chowra Island over a decade (Justin 2003). The latter 
study also mentions the unit chanong uveav, as the local 
equivalent at Chowra Island, for a Car Nicobarese tuhet. 

Justin (2003), as it is primarily restricted to Car Nicobar 
Island, does not explain the units below or above chonong 
uvaydv in the indigenous system at Chowra. 
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32. Indigenous Island Council 

As mentioned, two systems, the indigenous Island Council and 
the PRI equivalent Tribal Council, work in tandem at Chowra 
Island, according to the present study. 

On the whole, the community has organized itself into five 
parties called hokngook. Food crop farms form the basis of the 
indigenous socio-political organization. 

32.1. Komanaich: At the bottom of the native system is a unit 
called komanaich. It is a unit of one or more nuclear families 
who stay and work together to maintain a portion of the food 
garden. The unit komanaich is also defined by its kitchen 
house. In this sense of staying together, it is identified as one 
household. According to island customs, even if there is more 
than one or two nuclear families grouped under a komanaich, 
the number of hearths cannot be increased. Even if the 
population of the komanaich increases, the number cannot be 
increased. 

The word for ‘portion’ in Sanenyo is konaich. The unit that 
possesses it, in this case the family group in charge of working 
in and maintaining it, is known as komanaich. This is the unit 
mentioned in Reddy and Sudersen (1986) as Yomkanal.a. 
Kanala is a word in Sanenyo meaning a ‘share’. Yom means a 
person or people, especially an elderly lot. So the terms 
komanaich and yomkanal.a could be used inter-changeably for 
a social unit described above. 

The unit definitely traces its origin to a common ancestor. 
Exogamy is allowed. Patrilineal descent is reckoned but 
resident pattern can be either patrilocal or matrilocal. A 
komanaich is identified by the male member who leads its 
activities. The activities included in the criteria are building 
houses, maintaining coconut farms, herder activities, 
performing rituals etc. The assigning of the portion of the food 
garden to the komanaich is done by the Head of hdkngddk, the 
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party, through the Head of an inter-mediatory unit called iydh 
uvaydv or chonong uvaydv. 

3.2.2. Chonong uvaydv: This is a unit where a group of 
kdmandich is administered together by a Head. The phrase 
chonong uvaydv , literally translated, would mean ‘tree 
coconut’. As seen in the introduction to this Section, Justin 
(2003) identifies this unit as a family group with a common 
(unknown) ancestor. 

The chonong uvaydv is identified by its Head. The Head is 
from one of the member kdmandich and is assisted by one 
more person from a different kdmandich. The Head is chosen 
for the leadership qualities, knowledge of customs and rituals, 
skills in sailing, mastery over canoe making, and building or 
repairing of fences of food gardens etc. Administrative 
authority of the Head of the chonong uvaydv is directly 
observed over the portions of land of the kdmandich assigned 
under the Head. 
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Fig. 1 Indigenous socio-political organization at Chowra 

3.2.3. Hokngook: A group of six or more chonong uvaydv 
makes a hokngook, the party. The word hokngook means 
‘food’ in Chowra Language. A hokngook is known by its first 
captain. Three captains, in the order of first, second, and third, 
are elected for each hokngook. The period of their office is 
three years. 

As mentioned before, the whole island population has been 
administered by attaching each individual in the island through 
the units of komanaich and chonong uvaydv to these five 
parties. One is born into a hokngook at Chowra. Based on the 
choice one makes between a patrilocal or matrilocal residence 
for one’s family after marriage, the membership of the person 
may change from one hokngook to another. It is noted that 
members through affinal route are generally not welcomed to 
higher posts in their new hokngook. 
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What is noticeable is that there is no equal distribution of 
the island community under these five hdkngdok. Alpha’s 
hokngook, for instance, has only six chonong uvaydv as against 
Esaw’s hokngdok, which has the highest number with ten. 
Alpha’s hokngook is the smallest among the five. The 
structural rigidity of this system is because of the restrictions 
on the basic unit komandich. As the number of household units 
of the party, or for that matter, of the island, cannot be 
increased, the hokngook do not literally grow in terms of 
number of units. Population has increased noticeably in the 
past century but the ratio between the hokngook seems to 
remain constant. The hokngook thus becomes the most 
important unit because of its decision-making authority on 
maintaining natural and human resources of the island. 

3.2,4. Functioning of the Units: The major duty of a hokngook, 
according to island tradition, is to organize panuho nut, the 
celebration of eating pig. The responsibility shifts in turn every 
year so that each hokngook is in charge of organizing the 
festival once in five years. Other four hokngdok become the 
laborers of the organizer for the year. As will be discussed in a 
later section, the festival is about slaughtering as many pigs as 
possible within a week for island consumption. The organizing 
hokngdok spends almost all of its adult pig wealth during this 
week. Dancers and participants in rituals have to be honored 
with food items as gifts. This involves lot of tubers, coconuts, 
and plantains. Forest produce is involved in building special 
fences for the festival pigs, repairing houses etc. Money is 
involved in the form of cloths as gifts and uniforms. In short, 
all the economical resources of the organizing hokngdok are 
almost completely exhausted in a week. Since the onus of 
decision making is on the first captain of the hokngdok, there is 
an assertion of the hokngdok and its captain as the important 
decision makers in the indigenous system of the island. Each 
hdkngdok calculates its wealth and resources at the beginning 
of every year. Captains of all hdkngdok gather and discuss the 
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status of resources with the organizer of the year. They decide 
the work course for each party through the year so that the 
festival in the last quarter of the year is celebrated with pomp. 
This is one reason that the hokngdok control distribution of 
food garden and cultivation methods used in the island. Each 
hokngdok owns land and food gardens in the neighboring 
Teressa Island too. Apart from the horticultural aspects, the 
hokngdok also plan activities like canoe building, pot making 
etc during the meeting. 

Canoe building, though commissioned by the Head of the 
hokngdok , takes place mostly under the supervision of the 
Head of chonong uvayav. The workforce is gathered by the 
Head of chonong uvayav to go to Teressa Island for canoe 
building. They also appoint a chief architect for the work and 
follow the expert’s instructions on technical aspects. On pot 
making, the Head of the hokngdok plans, again to be 
implemented under the supervision of the Head of chonong 
uvayav, when to leave for Teressa Island to bring the mud 
required for the process. They also choose the women, who 
would make the pot, bake, and color them etc. While all these 
work course is planned by the captains of the hokngdok, the 
decisions are authorized to be carried out only by the Council 
of Chief Captain and Deputies. 

5.2.5. Chief Captain: The office of the Chief Captain is for a 
life term. The Chief Captain may step down on health reasons. 
The Chief Captain is selected on the basis of age, knowledge 
of tradition and customs, administrative quality, personal 
wealth etc. Unlike Car Nicobar, where the Chief Captain of the 
Island is traditionally from one village, the Chief Captain at 
Chowra can be from any one of the villages. There was no 
historical record made available to this research on the date 
when the Chief Captainship system arrived at the island. But 
the present generation remembers at least four Chief Captains 
spanning over a century and a half. 
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The Chief Captain is assisted in the office by two deputies, 
namely, the Vice Chief Captain and Deputy Vice Chief 
Captain. During the present study, Sri. Jonathan, Sri. Leslie 
and Sri. Anthony Moses, were the three Captains respectively. 
Their terms too expand over their life period or until their 
health permits. Of the three Captains mentioned, the Deputy 
Vice Chief, Sri. Anthony Moses is the son of former Chief 
Captain, Late Sri. Moses Loklolul. Kauvvi, ‘the head’, is the 
native equivalent for the three Chiefs of the Island. In fact, all 
the captains, including those of chonong uvavdv and hokngodk 
too are addressed as kauvvi. Further distinction of the person is 
made by adding their name. The Chief Captain of the island is 
referred sometimes as Ma kauvvi as in Car Nicobar Island. 

Fig.l gives the structure of the indigenous Island Council 
of Chowra. The Chief Captain of the Council also becomes the 
Chairman of the PRI equivalent Tribal Council established for 
the purpose of assisting the District Administration. 

4. Chief dom 
4.1. The Criterion 

Presence of a person with a title ‘chief does not turn a 
community into the organization of chiefdom. Organization of 
more than one group under a chief, presence of social classes 
and status differences determined by the closeness to the chief 
or the founding ancestors, hereditary transfer of social power, 
permanency of the office of the political system which outlast 
the individuals who occupy it, population pressure on food 
production and its tackling in terms of increased production, 
collection and redistribution of food material etc are generally 
considered the attributes of a chiefdom (Smith 1999). Service 
(1978) lists the following as features of chiefdom. Put on an 
evolutionary trajectory, families would still remain the base for 
.a society classified under chiefdom. This does not make it 
egalitarian. Certain members will have more control over 
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goods and production but still there would not be true private 
property, entrepreneur or markets. Parts of the society seem to 
have specialized skills and duties in chiefdom. The 
specializations bring in identifiable differentiation and 
independence in the society. But still, there is central direction 
and authority. There is, of course, no true government. 

4.2. The Office of the Chief Captain 

It is the central leadership and the way it influences life at 
Chowra Island actually makes the community a candidate to be 
classified as chiefdom. The office of the Chief, the Vice Chief 
and the Deputy Vice Chief captains is at the helm, sanctioning 
decisions on issues of the island. They are a distinct body of 
political control in the island. As seen under section 3, most of 
the functional administrative decisions within the hokngook are 
taken by the chiefs of the hokngook. They consult the heads of 
the chonong uvayav in making the decisions but there is always 
a need for an approval from the Chief Captain. The office is 
permanent. In fact, literature on colonial activities in the 
Nicobar Islands has always mentioned the presence of the 
chief in them (Singh 2003). The members of the family of the 
Chief or the other captains do enjoy attention. However, it is 
most likely that they deserve it through their own merit. The 
office is not hereditary at Chowra. As mentioned before, the 
Chief Captain is selected based on his knowledge and ability. 
Always a male member occupies the position. The captains of 
the hokngook are elected through votes. There is self¬ 
regulation within the hdkngdoi: of letting people take turns. 
There are times when a person is allowed a second turn. 

4.3. Economic Specialization 

Economic specialization is another thing that vouches Sanenyo 
community’s candidature for chiefdom. By economic 
specialization in the community, we do not mean the formation 
of farmers, carpenters, barbers, accountants, etc as special 
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classes or group of people within the community. They may 
generally be the first signs of a non-egalitarian chiefdom as 
different from a tribe or a band. The Chowrites, overall, are 
actually known as the best canoe-makers, the only pot makers, 
and a dangerous group of shamans. The shamans could 
perform acts like destroying from the land a canoe on sea by 
fire, identifying the presence of unwanted trees in the farm that 
bring diseases in the family etc. In fact, for a long time the 
Chowrites controlled the whole Nicobar trade using these 
skills. Chowra economy in fact sustained itself by bartering 
pigs and cane products from other islands for their canoes, pots 
and assignments for shamans until the advent of aluminum 
cooking vessels and motor boats. 

Chowra economy, at present, is based totally on the 
subsistent agricultural products, and copra and areca nut that 
are exported. With an exception to the Car Nicobarese people, 
the Nicobarese of Central and Southern groups own coconut 
farms in the neighboring islands too. The Chowrites own such 
farms in the northern and western side of the Teressa Island. 
As coconut and areca nut can be owned by individual members 
or an individual komanaich , the members decide on who 
should stay in the farms in neighboring islands for the 
maintenance and copra production. They consult the heads of 
the chonong uvayav before they decide. These heads keep the 
heads of the hokngook informed about which of their members 
are on the island and which one of them is in the neighboring 
island. Keeping the heads of chonong uvayav and hokngook is 
important because they are ones who decide on canoe building, 
festival house repairing, festival activities and other annual 
events which includes even the agricultural farming on the 
island. The Chief Captain office plays a general monitoring 
role. The Chief Captain may decide that certain trading agents 
from outside who buys the copra and areca may not be 
engaged and any new ideas like taking building contracts can 
be taken or not etc. 
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The categorization of activities of life into that of subsistent 
related and export trade related and controlling or maintaining 
them through a political leadership provides a clear picture 
economic specialization at Chowra. 

4.4. Collection and Redistribution 

Collection and redistribution of food products are not 
exclusive to chiefdom. Bands and tribes were always known to 
do it in many times. However, in those societies, it is mostly 
unplanned or at least planned not on a big scale like for over 
six etc. A hunted game was mostly shared with kin groups 
among tribes and bands. The planning of tuber farming in 
fenced gardens—are evidences of large - sca le production 
planning by the political authorities. The particular phase of 
the festival panuho nut when food items like tubers, banana 
and coconut are brought to the festival house, displayed there 
for three days before distributing them to the dancers and other 
participants is a visual exhibition of redistribution in Chowrite 
community. 


4.5. Other Indicators 
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conform to all the norms set by theory. Chowrites, for example, 
number only 1376, a number lower than 5000 with which 
chiefdom population normally begin. The categorical 
appointment of an ethnic priest in chiefdom communities too is 
not seen in the present Chowrite society. Male and female 
shamans were present until the recent past. During the festival 


panuho nut one can see the appointment of a chief shaman who 
remains, in the festival hut throughout the fortnight long 
celebrations performing the rituals. Through most of the 
history, it also seems like the office of the chief and the chief 
shaman remained separate, but the chiefs consulted with the 
shamans before taking major decisions to find if the spirits on 
the island agreed or not. Children of these ethnic healers and 
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priests became shamans too, in most cases. After the arrival of 
modern education and Christianity the role of shamans has 
been dressed down to traditional healers who posses the 
medicinal knowledge of the local flora. It is still the case that 
their children too pick this knowledge from the parents. 
Nevertheless, the trend is declining. 

A few other aspects of Chowrite culture justify a chiefdom 
classification of the community at Chowra. The presence of 
ritual-festival complex, an panam, is one. Specialization in pot 
making in which phases are assigned for bringing mud, 
pounding the mud, making the pot, baking the pot and the 
different roles taken by men and women in the process, 
presence of experts for painting the pots etc. are some other. 
They all add to the argument that the chiefdom classification 
fits the society best. Even if there is specialization and unequal 
distribution of power and authority at Chowra, what Triandis, 
H. and M. Gelfand (1998) would call vertical collectivism, the 
values of life are mostlv shared bv the members of the 
community. 

5. Metaphorical Understanding of Chief dom 
5.1. Metaphor - Definition: 

In a general sense, metaphors are said to be that aspect of 
language where one subject is described to be another when 
the subjects are not naturally related. There is analogy and 
equation. Goatly (1997: 8) defines a metaphor broadly like this, 

‘Metaphor occurs when a unit of discourse is used to 
refer unconventionally to an object, process or 
concept, or colligates in an unconventional way. And 
when this unconventional act of reference or 
colligation is understood on the basis of similarity, 
matching or analogy involving the conventional 
referent or colligates of the unit and the actual 
unconventional referent or colligates.’ 
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Following is a working definition of metonymic relationship 
from Knowles and Moon (2006:47) that may also be relevant 
for the discussion on the socio-political units of Chowra Island. 

‘Metonymy.. .a specific kind of figurative 
language.. .involving either part-and-whole 

relations... or else naming by association’. 

5.2. Metaphorical and Metonymical Associations of the Units: 


The following is the list of the terms for the units in the 
indigenous socio-political organization. 


Names of the units or 
participants 

Meaning 

Status under the 
indigenous socio-political 
organization 

Komanaich (smallest unit) 

Agent (Person, people, 
animals or objects) 
associated to any portion 

Nuclear family unit which 
maintains a portion of the 
food garden. 

chonong uvaydv (next 
higher unit) 

Tree Coconut 

A unit consisting of three 
or more Komanaich. 

hdkngook (next higher 
unit) 

Food 

A unit consisting of six or 
more chonong uvaydv. 

kauvvi 

Head 

Head of any of the units of 
the socio-political 
organization. 

ma kauvvi 

Tall/Big head 

Chief captain of the 
political organization. 


Table 1 


The word for the first unit in Table 1, Komanaich, has a 
meaning that is very close to its core meaning or its first 
referent. As mentioned in section 3.2.1., Komanaich is a word 
paired with Kondich meaning ‘a portion’. Restriction in the 
senses of ‘portion’ to the ‘portion of the food garden’, ‘people’ 
to ‘the nuclear family’ are two main changes happening during 
the entry of the word komanaich into the lexicon of political 
organization at Chowra. Supplementary connotations added 
during the process are the komandicJT s responsibilities in the 
maintenance of the portion, their affiliation to the particular 
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hbkngobk, ownership of the hbkngobk over the garden, 
authority of the hbkngdok in dividing the garden into sections 
to be assigned to its people and the headship of the kanvvi. 

chonong uvayav, ( chonong - tree, uvayav = coconut), the 
immediate higher up socio-political unit at Chowra, has a 
metaphor for its name. There is zero literal correspondence 
between a chonong uvayav and a group of nuclear families that 
may be socially and politically controlled by a selected 
member of the group. The families under the unit are usually 
believed to a common ancestry. Mapping of structural analogy 
is made between an un-branched chonong uvayav and a group 
of families from a common ancestor. It is an equation outside 
its domain. Like a typical metaphor, its interpretation requires 
a search for meanings not determined by language, logic or 
experience. It demands effort and imagination on the hearer’s 
part to establish the analogy. 

However, if the Komanaich units were also to be taken into 
consideration, as they fall under the headship of the socio¬ 
political unit, then a gap results in the mapping between the 
chonong of the natural world and chonong uvayav, the socio¬ 
political unit. This leads one to question the choice of chonong 
uvayav by the community as the name of the unit. A chonong 
uvayav is not the only one without branches that is commonly 
available in the island. Plenty of chonong hiydh, ‘tree areca 
nut’, which does not branch out even, are found in the island. 

Tree is a metaphor in many languages, for instance, Tamil 
and English. In the two languages mentioned last, there is a 
root analogy between the source ‘tree’ and its target. Most of 
the parts and uses of the source may be used as further 
metaphors for things associated with the target sharing 
conceptually similar structural relationships. The present study 
did not have enough tools to check elaborately the exploitation 
of the metaphor in Sanenyo common speech. Generally, it was 
observed that the chonong uvayav , the unit, was neither 
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‘young’ nor ‘old’, did not ‘die’, did not join others to become a 
‘grove’, and was not required to ‘bear fruits’. No one has 
drunk its ‘water’ and its ‘roots’ remained unknown. It was not 
‘swelling’ or ‘growing’. Restriction on the last proposition is 
natural in the Chowravian case. As mentioned in section 3.2.1., 
the number of units in the socio-political organization remains 
fixed even if the population increases. 

Hokngook is the top most socio-political unit at Chowra. 
The word hokngdok is polysemous as a noun ‘food’, verb ‘eat’ 
in Sanenyo. When used for the socio-political unit, it is used in 
the sense of the noun. As in the case of chonong uvayav, 
hokngdok too, as the target, shows little literal correspondence 
to the source in isolated contexts. Unlike chonong uvayav , 
where the demand was on domains outside its own for the 
interpretation, correspondence can be achieved if the 
knowledge of the Chowra socio-political system is available to 
the hearer in the case of hdkngdok. 


As mentioned in section 3.2.4, the main assignment of the 
hdkngdok seems to be the organization of panuho nut , the pig 
festival, and other festivals. However, the events and activities 
related to the festivals, particularly panuho nut, are such that 


the entire life at Chowra seems to be controlled by or directed 
towards the activities of the festivals. Thus, one of the major 
responsibilities of the hokngdok becomes the maintenance of 
food resources. Maintenance of food resources naturally 
includes control over land, in this case, the food gardens, and 
human resources. There is continuity and both the term 
hokngook and its referent here, the highest socio-political unit 
of the community, seem to fait within the same domain. The 


continuity between the name hokngook and the socio-political 
unit is that hdkngdok derives its name from one of its activities, 


that is, to maintain the food resources. In fact, the association 
is to the product of its activities, hokngook. There is contiguity 
within the domain. This is purely a case of metonymy. 
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Kauvvi and Ma Kauvvi , the terms for any captain of any 
unit and the Chief Captain respectively, seem to have 
figurative analogy that is universal and found across languages. 
Following is the etymological account of the English word 
‘captain’ given in the Concise Oxford Dictionary (Tenth 
Edition). 

ORIGIN ME: from OFr, capitain (superseding 
earlier chevetaigne ‘chieftain’), from late L. 
capitaneus ‘chief’, from L. caput, capit- ‘head’.’ 

Even the English word ‘Chieftain’, which may appear as a 
contracted ‘Chief Captain’, is a word in itself and had its origin 
from the Late Latin form capitaneus ‘chief’. One can say the 
word ‘Captain’ is a dead metaphor which evokes reference, 
among the present day speaker of English, to a chief or a 
leader without invoking the idea of physical head. In fact, the 
superficial similarity between the words ‘cap’ and ‘captain’ is 
because they both trace their origin to the Late Latin form 
‘Caput The adjective Ma ‘tall/big’ too exhibits, both 
structurally and ontologically, a universal appeal. Higher up 
offices with important responsibilities in a hierarchical 
schemes are generally perceived to be taller or bigger than the 
others. 

5.3. Implications of the Metaphors and Metonyms 

The term hokngook from socio-political organization is a 
metonym which shares the domain of its first referent -the 
food and the people and activities related to it. The term 
Komandich is a word from the language whose scope has been 
restricted and its referents specified. Use of the metaphor 
chonong uvayav in the socio-political organization of Chowra 
does not exhibit any characteristics of a root analogy or a 
conceptual metaphor. Names of the units that govern chonong 
uvayav or are governed by them, both, do not establish 
relationship with it in a bigger schema. Even individual 
participants like the Kauvvi or Ma Kauvvi are not ‘heads’ of a 



Probing metaphorical 


201 


chonong. The structural gap in the mapping, in particular 
reference to chonong uvayav, gives rise to two questions, first, 
the origin of the indigenous socio-political organization at 
Chowra and, second, the choice of chonong uvayav for the unit 
of common ancestry. 

6. Afterthoughts on the Indigenous Socio-Political 
Organization 

6.1. The genesis and the References from History 

Not much is known about the origin of this organization at 
Chowra. In the present form and in its entirety, it is certainly 
different from the socio-political organization at the Southern 
Nicobar Islands, Nancowry and Car Nicobar. There is one 
level of similarity between the units of the socio-political 
system at Chowra and Car Nicobar that has been noticed in the 
studies reviewed in Section 3.1. Both Reddy and Sudersen 
(1986) and Justin (2003) mention chonong uvayav as an 
equivalent to the tuhet at Car Nicobar. Tuhet is a unit that is 
defined by the common ancestry at Car Nicobar. Hdkngddk 
and Komandich do not have equivalents in other islands. The 
office of the chief captain of the island is mentioned in the 
histories of all the islands at Nicobars. The chief captains were 
the ones who formally welcomed the European visitors on 
most instances (Singh, S. J. 2003). Even if one has to extract 
inferences from the story of establishment of first school at 
Chowra (Reddy, G. P. 2003), one comes across a captain of the 
island, members of a lineage, a group of shamans and a head of 
the group of shamans. 

Two important historic observations are available in 
Bonnington (1931). First, the Captain was the most powerful 
administrator of the island at Chowra and it was considered an 
exception in the context of Nicobars in those times. Decisions 
on disputes were taken by the Captain who may consult the 
elders for the type of fine or punishment to the ordered on the 
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offenders. Internal governance in other islands of Nicobar was 
more a collective effort with the elders of the island taking 
many decisions. The second and the most important one is that 
during ‘the great ossuary feast... the whole island community 
resort to the al(au) panam or the village near the shore... all 
the pig in the island are... killed and eaten... exhausts the 
complete stock of pork on the island...’ (Bonnington 
1931:183). The first note refers to the Captain as the single 
most authority and the second one mentions about the ossuary 
festival where the whole island community participates 

slaughtering all the pigs of the island. 3 

6.2. Implications of the References 

The great ossuary festival mentioned in Bonnington (1931) is 
the predecessor of the present day panftho nut, the festival 
where one of the five hdkngooks is the organizer, others their 
laborers, and the organizing hokngdok slaughters all its pig 
wealth. The major differences between the present and the past 
foims of this festival are this Thp it-> 

celebrated the digging of bones of the dead from the 
graveyard. In the present context, it is panftho nut, literally 
tianslated, an ‘eat pork’ festival. The idea of oss^arv it n 
is there, is only symbolic (Cruz S. W. Forthcoming). Relevant 
to oui discussion is the absence of mentioning of a unit or a 
concept like the hokngdok in the celebration or organization of 
the festival. In the past, the whole island participated and all 
the pig wealth of the island was exhausted. This leads one to 
believe that replenishing the pig wealth for the coming year 
must not have been a daunting task. At the least, there must 
have been less fear about a dry year. In the present form, 
though the whole island celebrates, there are distinct roles to 


I?*L 0 v^oT?T nt ; efere " C p es that C0Llld have g jve " some more value to this topic is 
Sahay V.S 1976 Traditional System of Inter-Island Trade in the Nicobar Archipelago’ 
Journal of Social Research, Vol. XIX, No.2. 6 
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the hokngook units as organizer and laborers. It only the 
organizing hokngook which slaughters all its pig wealth. 

6.3. Afterthoughts — The New Scenario 

The picture that emerges from the inferences of these historic 
accounts is this. Chonong uvayav and Ma Kauvvi are the two 
traditionally recognized socio-political units of Chowra socio¬ 
political organization. Komanaich and hokngook are newly 
formed units that seem to have evolved in response to the 
growing population, scarcity of land and the burden on 
economic resources. When maintenance of these became a 
matter of utmost importance, the socio-political organization 
underwent a modification. Status was attributed to the nuclear 
families as komanaich units and the concept of a higher body, 
the hokngook , was introduced. Since the environment that gave 
rise to these units stressed on food production and maintenance 
of land in the farm of food gardens, names that have contiguity 
with its functions and produces, particularly in the case of 
hokngook, appears natural. 

7. Choice of chonong uvayav 

‘Foregrounding importance’ appears to be one of the basic 
principles in Chowravian perspective of life. When common 
ancestry was the major quest-on with reference to identity, 
function and social relevance in an island of many lineages, the 
community seems to have addressed the lineage units by one 
of the most important objects and the symbols of their island 
life, the chonong uvayav. Ap , the canoe, kariong, the earthen 
pot, nyi hipul , the traditional hut, and nut , the pig could have 
been the other natural contenders with each one having its own 
importance in their respective domains. The criterion through 
which the chonong uvayav has emerged as the most suitable 
candidate is unknown. That uvayav ‘coconut’ is important both 
as subsistent food and as a trade commodity could have helped 
it win the challenge from objects like nyi hipul, can only be a 
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speculation. Uvayav and kariong, for instance, shared 
exclusive spaces as objects of external and internal trade 
respectively, kariong trade within Nicobars, until recently, 
defined the economy of Chowra. But karidng-making is the 
best kept secret of the Chowrites and using it as a symbol in a 
public sphere might have had its own inhibitions. This is can 
also be the reason why it is not represented in the flags of the 
youth parties of each village in the island during the 
inauguration of Christmas celebrations. 


Fig 2 is the symbol on the flags of 
youth groups from each of the five 
villages at Chowra during the 
Inauguration of Christmas carols 
in the island. Four of the five 
important symbols of Nicobarese 
life mentioned in the above 
passage find their place here. 
Kariong is conspicuous by its 
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symbol in the same order. This 
can be taken as support for our 
argument that the choice of 
chonong uvayav as the metaphor for a lineage with common 
ancestry is not because of any structural or conceptual analogy 
but because of one of the basic principles of Chowravian 
worldview, that is, foregrounding importance. 



8. Conclusion 

Probing metaphorical understanding of the socio-political 
organization at Chowra introduces one to the worldview of 
Chowravian culture, in general, and the history of the system, 
m particular. The indigenously developed council at Chowra 
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reflects one of humanity’s soulful battles against low resources 
and trying conditions. It has evolved itself over a valuable 
discernment of the island’s natural restrictions. The 
Chowravian understanding is that the limitations on their 
nature are real. It is beyond the human ability to augment 
nature. The situation is peculiar to Chowra even within the 
islands of Nicobar where resources were generally considered 
to be in plenty and even making the people look lazy (Kloss 
1902:227). The Chowravian response to this predicament was 
a prudent attempt to conserve available resources and enhance 
stable production from them. 

The elegance of the system is suggestive in the 
metaphorical and metonymical terms used for the units. It also 
combines tradition with evolution. While traditionally found 
metaphor, chonong uvaydv, gives the perspective of the culture 
of ‘foregrounding importance’ in Chowravian tradition, 
Komandich and hokngook, the latter in particular, brings in a 
metonymical continuity between the unit and its 
responsibilities. The metaphorical and metonymical distinction 
between first and latter two explains the two historically 
separate origin of the units. 
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PERSONAL PRONOUNS IN MUOT 


V.R.Rajasingh 
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ABSTRACT 

Pronouns, as a sub-class of nouns, are known for 
their significance in the communication process of 
any language. As substitute for nouns, they fulfill 
morphological function. As constituents of phrases 
and sentences they perform syntactic function. Muot 
tefers to that variety of Nicobarese language being 
spoken by the natives of Katchal, Kamorta, 
Nancowry and Trinket of the Central Nicobars, India. 
Exploring, to a possible extent, the lexical and formal 
functions found in Muot as manifested by its personal 
pronouns would be the aim of this paper. Data for the 
investigation is drawn from the Andaman 
Commissioned Project database collected from 
Nancowry Island between September and December 
of 2004, just before the killer tsunami. 


1. Aim 

The paper aims to identify, classify and to some extent discuss 
the syntactic functions of personal pronouns in Muot. 

2. Muot: Muot, the indigenous name of Nancowry Island, here 
refers to one of the six languages which the ethnic tribal 
community, the Nicobarese of Nicobar Islands of India 
speaks. 1 It is the mother tongue of a total of 5826 natives who 
inhabit the four islands, Katchal, Kamorta, Nancowry and 
Tiinket of Central Nicobars. 2 The language is said to belong to 
the major Austro- Asiatic family of languages through its 
Mon-Khmer sub-family. 
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3. Pronouns 

Pronoun is ‘a term used in the grammatical classification of 
words, referring to the closed set of items which can be used to 
substitute for a noun phrase or a single noun’ (Crystal 2002). 
Pronouns are “a special case of the more general linguistic 
category, ‘pro-forms’, i.e., function words that replace 

syntactic units of a particular category.pronouns tend to 

have a fairly simple phonological structure, normally one or 

two syllables.’’(Brown 2006). Being a sub-class of nouns, 

pronouns share the categorial function with nouns. 

3.1. Personal Pronouns 

Personal pronouns are those pronouns, which besides standing 
for nouns, would distinguish the participants in a speech act 
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into the person speaking , the person spoken to and the person 
spoken of in terms of the relative spatial distance between them. 

3.1.1. Personal pronouns in historical works 

Personal pronouns with such distinct semantic denotation did 
not escape the attention of men of yester-years who produced 
pioneering works in the language over and above their 
assigned governmental duties. The paper likes to have a look at 
the treatment which the personal pronouns received in some of 
the earlier works on the language. 

3.1.1.1. Personal pronouns in deRoepstorff (1884) 

In this monumental work, through his introduction to the 
grammar, deRoepstorff writes ‘personal pronouns in 
Nicobarese are of three persons, and of three numbers- 
singulai, dual and plural, but with no distinction of gender or 
case . The book lists the personal pronouns identified in the 
language undei three heads corresponding to the three persons 
as given below: 

1 st person 

singular tiue, tie, ie 

dual tieae 

plural tieoi 

2 nd person 

singular me 

dual i na 

plural jfae 

3 rd person 

singulai anash, ninne (demonstrative 

pronouns) 
on a 

ofe 


dual 

plural 
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Table-1 below would provide a cursory look at them. 
Table-1 


Number 

First person 

Second 

person 

Third person 

Singular 

tiue, tie, ie 

me 

ana;h, ninne 

Dual 

tieae 

ina 

Ona 

Plural 

tieoi 

ifae 

OfiE 


3.1.1.2. Personal pronouns in Man (1889) 

In this important work, through his notes on the grammar, Man 
writes ‘the personal and possessive pronouns are chiefly 
remarkable for the expression of the dual number. In common 
with adjectives and substantives they are uninflected for 
gender, while case is indicated in the former by the same 
means as that adopted with substantives’. The work lists the 
following as the personal pronouns: 


chiia, cha 

‘I’ also ‘my’; 

men, me 

‘thou’ also ‘thy’; 

an, na 

‘he, she, it’ also ‘his, her, its’; 

hen, chaai 

‘we’ (dual), also ‘our’ (dual); 

he, chioi 

‘we’ (of three or more), also ‘our’ (of three 
or more); 

yol-chiia, yol-cha ‘we’ (of a community), also ‘our’ (of a 
community); 

ina 

‘you’ (dual), also ‘your’ (dual); 

ife 

‘you’ (of three or more), also ‘your’ (of 
three or more); 

yol-meii,yol-me 

‘you’ (of a community), also ‘your’ (of a 
community); 
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ena 

‘they’ (dual), also ‘their’ (dual); 

ofe 

‘they’ (of three or more), also ‘their’ (of a 


community); 

yol-an 

‘they’ (of a community), also ‘their’ (of a 


community). 


Unlike deRoepstorff (1884), the work didn’t present the items 
under any heads. However, they can be grouped into the 
following manner as shown in table-2. 

Table-2 


Number 

First person 

Second person 

Third person 

Singular 

chiia, cha 

men, me 

an, na 

Dual 

heti, chaai 

ina 

ena 

Plural 

he, chioi 

ife 

ofe 

Collective 

yol-chiia, yol-cha 

yol-meri, 

yol-me 

yol-an 


3.1.2. Personal pronouns of the current study 


The data under analysis attest the following personal pronouns: 


cheuoii,chon 

T 

cm ai 

‘ we-dl (excl)’ 

hank 

‘we-dl (incl)’ 

chi ooi 

‘we-pl (excl)’ 

hek 

‘we-pl (incl)’ 

men 

‘you-sg’ 

inaii 

‘you-dl’ 

ife 

‘you-pl’ 

ohn 

‘he / she / it (sg)’ 

unah 

‘they-mas, fern, neut (dl)’ 

ufe 

‘they-mas, fem, neut (pi)’ 
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3.1.2.1. Morphological function 

Morphologically, all these personal pronouns can be analyzed 
into singular personal pronouns, dual personal pronouns and 
plural personal pronouns on the basis of the number of noun 
for which they stand for. 

3.1.2.1.1. Singular personal pronouns 

They are personal pronouns, the numerical value of the noun 
for which they stand for would be one. Depending upon the 
role of the participant in the speech act (cf. 3.1) which the 
pronoun refers to, these singular personal pronouns are 
classified into three types as first person singular pronouns, 
second person singular pronouns and third person singular 
pronouns. 

3.1.2.1.1.1. First person singular pronouns 

They are singular personal pronouns, the noun for which they 
stand for would refer to the speaker in a speech situation. Two 
such pronouns are identified in the data. They are the 
following: 

cheuon T 

chon T 

They are monomorphemic words and found to be 
morpological free variants and express one and the same 
meaning. The following sentence would illustrate their 
syntactic occurrence. 

yutikos in cheUbn 
Eutychus sp.wd I 
‘I am Eutychus’. 

3.1.2.1.1.2. Second person singular pronouns 

They are singular personal pronouns, the noun for which they 
stand for would refer to the person being spoken to in a speech 
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situation. Only one such pronoun is identified in the data and 
that is the following: 

men ‘you-sg’ 

It is a monomorphemic word. The following sentence would 
illustrate its syntactic occurrence. 

hev ko kinyom in men 
see sp.wd child sp.wd you-sg 
‘You-sg see the child’. 

3.1.2.1.13. Third person singular pronouns 

They are singular personal pronouns, the noun for which they 
stand for would refer to the person being spoken of in a speech 
situation. Only one pronoun of this kind is identified in the 
data and that is the following: 

onn ‘he-sg / she-sg / it-sg’ 

It is a monomoiphemic word. The following sentence would 
illustrate its syntactic occurrence, 
reukto in ohn tin enh 
comes sp.wd he here 
‘He comes here’. 

Besides standing for human nouns, it stands for non-human 
nouns also. The following sentence would illustrate its non¬ 
human reference. 

hat vT 6 in ohn 
neg work sp.wd it 
‘It doesn’t work’. 


3.1.2.1. 2. Dual personal pronouns 

They are personal pronouns, the numerical value of the noun 
for which they stand for would be two. Based on the role of 
the paiticipant in the speech act (ibid.) for which the pronouns 
stand for, the dual personal pronouns are grouped into three 
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types as first person dual pronouns, second person dual 
pronouns and third person dual pronouns. 

3.1.2.1.2.1. First person dual pronouns 

They are dual personal pronouns, the nouns for which they 
stand for would refer to the speakers in a speech situation. 
Dual personal pronouns of this type are found to be in the 
nature of referring to the speakers both inclusively and 
exclusively. On that ground, they are classified into inclusive 
first person dual pronouns and exclusive first person dual 
pronouns. 

3.1.2.1.2.1.1. Inclusive first person dual pronouns 

They are first person dual pronouns, the composition of the 
two nouns for which they stand for would include a noun from 
the second person (i.e., the person spoken to) also. The data 
identifies only one pronoun of this type.: 

hank ‘we-dl (incl) ’ 

It is a monomorphemic word because it is composed of only 
one morpheme. 

3.1.2.1.2.1.2. Exclusive first person dual pronouns 

They are first person dual pronouns, the two nouns for which 
they stand for would be from the first person itself (i.e., the 
person speaking). The data identifies only one such pronoun.: 

chi ai ‘we-dl (excl)’ 

It may be a bimorphemic word because it is supposed to 
consist of two morphemes. The following sentence can be 
taken as example for its syntactic occurrence. 

hev in men in chi ai 
see sp.wd you-sg sp.wd we-dl 
‘We-dl see you’. 
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3.1.2.1.2.2. Second person dual pronouns 

They are dual personal pronouns, the two nouns for which they 
stand for would refer to the persons being spoken to in a 
speech situation. Only one such pronoun is identified in the 
data.: 


inaii ‘you-dl’ 

It may be a bimorphemic word because it is supposed to 
consist of two morphemes. The following sentence can be 
taken as example for its syntactic occurrence. 

yo dering in inaii 
going Dering sp.wd you-dl 
‘You-dl are going to Dering’ 

3.1.2.1.2.3. Third person dual, pronouns 

They are dual personal pronouns the two nouns for which they 
stand for would refer to the persons being spoken of in a 
speech situation. The data is found to have only one such 
pronoun : 

unaii ‘they-mas, fern, neut (dl) ’ 

It may be a bimorphemic word because it is supposed to have 

two morphemes. The following sentence would illustrate its 
syntactic occurrence. 

top in reak in unaii 

drinking sp.wd water sp.wd they-hum-dl 
‘They-hum-dl are drinking water’. 

It would stand for the non-human nouns also as can be seen 
from the following sentence: 

romo in unaii 

take rest sp.wd they-nonhum-dl 

‘They-nonhum-dl take rest’. 
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3.1.2.1.3. Plural personal pronouns 

They are personal pronouns, the numerical value of the noun 
for which they stand for would be more than two. Based on 
the role of the participant in the speech act (ibid.) for which the 
pronouns stand for, the plural personal pronouns are grouped 
into three types, as first person plural pronouns, second person 
plural pronouns and third person plural pronouns. 

3.1.2.1.3.1. First person plural pronouns 

They are plural personal pronouns, the nouns for which they 
stand for would refer to the speakers in a speech situation. 
Personal pronouns of this type are found referring to the 
speakers both inclusively and exclusively. Based on that, they 
are termed inclusive first person plural pronouns and exclusive 
first person plular pronouns. 

3.1.2.1.3.1.1. Inclusive first person plural pronouns 

They are first person plural pronouns, the composition of the 
nouns for which they stand for would include nouns from the 
second person (i.e., the person spoken to) also. The data 
identifies only one pronoun of this type and that is the 
following: 

hek ‘we-pl (incl)’ 

It is a monomorphemic word. The following sentence can be 
taken as example for its syntactic occurrence in the language.: 

tulong hot se an kepten hek 
good sp.wd captain our 

‘Our captain is good’. 

3.1.2.1.3.1.2. Exclusive first person plural pronouns 

They are first person plural pronouns, all the nouns for which 
they stand for would be from the first person itself (i.e., the 
person speaking). The data identifies only one such pronoun : 
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chi obi ‘we-pl (excl)’ 

It may be a bimorphemic word because it is assumed to consist 
of two morphemes. The sentence below would illustrate its 
syntactic occurrence: 

in on not in chi ooi 
rear pig sp.wd we-pl-excl 
‘We (pi) rear pig’. 

3J. 2.1.3.2. Second person plural pronouns 

They are plural personal pronouns, the nouns for which they 
stand for would refer to the person spoken to in a speech 
situation. Only one such pronoun is identified in the data.: 

ife ‘you-pT 

It may be a bimorphemic word because it is supposed to 
consist of two morphemes. The sentence given below can be 
taken as illustration for its syntactic occurrence. 

son store in ife tin er>n 

U - “ ^ uu vj.ni 

come sp.wd you-pl here 
‘You (pi) come here’. 

3.1.2.1.3.3. Third person plural pronouns 

They are plural personal pronouns, the nouns for which they 
stand for would refer to the person being spoken of in a speech 
situation. The data is found to have only one such pronoun.: 

ufe ‘they-mas, fern, neut (pi)’ 

It may be a bimorphemic word because it is supposed to have 
two morphemes. The following sentence would illustrate its 
syntactic occurrence. 

Onge in ufe to ngang e 

go sp.wd they-hum-pl there 
‘They (hum-pl) go there’. 
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Besides human nouns, it would stand for nonhuman nouns also 
as can be seen from the following example: 

isari upyuap an ufe 

grazing sp.wd they-nonhum-pl 

‘They (nonhum-pl) are grazing’. 

Table-3 would provide an overall picture of the classification 
of personal pronouns discussed in the sections from 3.1.2.1 to 
3.1.2.1.3.3. 

Table-3 




Dual 

Plural 

Person 

Singular 

Inclusive 

Exclusive 

Inclusive 

Exclusive 

First 

cheuori, chon 

hank 

chi ai 

hek 

chi ooi 

Second 

men 

inari 

ife 

Third 

onn 

unan 

ufe 


3.1.2.2. Syntactic function 

Syntactically, the personal pronouns of the language are found 
to function in the following ways: 

3.1.2.2.1. Personal pronouns as subject and object of sentences 

As a constituent of sentence, they function as subjects and 
objects 3 . The following two sentences can be taken for 
illustration: 

urikngdfah an sichua in orin 
kills sp.wd bird sp.wd he 

‘He kills the bird’. 

hev in men an kinyom 
sees sp.wd you-sg sp.wd the child 
‘The child sees you-sg’. 
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In the former, orin ‘third person singular pronoun’ functions as 
subject and in the latter, men ‘second person singular pronoun’ 
functions as object. As subject, it occurs at the end of the 
sentence and as object it does at the middle of the sentence 
next to verb. As seen in the above sentences, while functioning 
as subject and object they are preceded by spatial word. 

3.1.2.2.2. Personal pronouns as sentence initial constituent 

The normal word order of the language appears to be of VOS 
pattern as exemplified in the sentences given in section 
3.1.2.2.1. The pronouns, as subject of sentences occur as 
sentence-final constituents. But, it is apparent that during the 
process of negation along with negative word the pronouns 
also move to the sentence initial position. 4 The following 
sentence would illustrate it. 

unari nit isaii nguat 
they-dl neg eat coconut 
‘They-dl do not eat coconut’. 

The positive form of the sentence before the applying the 
process of negation would be of the following: 

isah nguat an unari 

eat coconut sp.wd they-nonhum-dl 

‘They-nonhum-dl eat coconut’. 

3.1.2.2.3. Personal pronouns as modifiers in noun phrases 

As modifiers of noun, personal pronouns occur in the Head + 
modifier pattern and convey the meaning of possessiveness. 
The sentence given below would illustrate such a function. 

hat ot in chuk mamiloh to oal matai chi ooi 
neg sp.wd playground in village our 
‘There is no playground in our village’. 
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In the above sentence, chi ooi ‘exclusive first person plural 
pronoun' is seen as functioning as modifier of the noun oal 
matai ‘village’. 

3.1.2.2A. Personal pronoun as word of plurality in noun 
phrases 

The third person plural pronoun is found to function as the 
word indicating plurality in noun phrases. Such a function can 
be seen from the following sentence: 

leat kichal ufe ko kinyom 
past swim pi sp.wd child 
‘The girls have swum’. 

In the above sentence, ufe ‘third person plural pronoun’ is seen 
functioning as word of plurality. 

3.1.2.3. Findings 

1. The personal pronouns of Muot are either monomorphemic 
or bimorphemic forms, 

2. They mark three numbers as singular, dual and plural. 

3. They do not denote gender, 

4. The nominative form itself functions as objective form, 

5. Muot of present day, marks inclusive and exclusive 
distinction with the dual and plural pronouns in the first 
person. 

6. They function as subject and object of sentences in the VOS 
pattern. 

7. As modifiers they occur in the Head + modifier pattern. 
NOTES 

1. The other five languages are Pu, Sanenyo, Luro, Lamongse 
and Takahanyilang. 
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2. The population figure is based on the Census data of 2001. 
Othei inhabited islands of Central Nicobars are Bampuka, 
Teressa and Chowra, 

3. As per the present data, the pronouns hank and hek appear to 
be exceptions to this general observation. However, in the 
sentence given in the section 3.1.2.1.3.1.1, hek is found to 
occur as a constituent of subject noun phrase. 

4. The third person singular pronoun seems an exception to this 
general observation which can be seen from the latter 
sentence in section 3.1.2.1.1.3. 

Abbreviations 


cf. 

Compare with 

excl 

Exclusive 

dl 

Dual 

fern 

Feminine 

hum 

Human 

ibid. 

In the same source 

incl 

Inclusive 

mas 

masculine 

neg 

Negative 

neut 

Neuter 

nonhum 

Non-human 

pl 

Plural 

sg. 

Singular 

sp.wd 

Spatial word 


Tentative phonemic inventory of present day Muot 
Vowels 


Vowels 

Front 

W«- 

U ULA, 




Spread 

Round 

High 

i T 


eu eu 

u u 

High-mid 

e e 



o 6 

Mid 


6 66 



Low-mid 

e e 



6 6 

Low 




a a 


Diphthongs eui eui 


Consonants 
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Bi-labiiil 

Labio¬ 

dental 

Alveolar 

Dental 

Retroflex 

Pultmil 

Velar 

Uvular 

Plosive 

P 



t 



k 


Nasal 

111 


n 



ny 

"B 

ii 

Tup 



V 






Fricative 


f 


s 



h 


Affricate 






eh 



Appro.ximant 


V 



i 

y 



Lateral 

Approximant 



i 
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THE SHOMPEN OF GREAT NICOBAR 
ISLAND: NEW LINGUISTIC AND GENETIC 
DATA, AND THE AUSTROASIATIC 
HOMELAND REVISITED 

George van Driem 
Leiden University, Leiden 

ABSTRACT 

The Shompen are indigenous foragers living on 
Great Nicobar Island. The Danish scholar F.A. de 
Roepstorff was the first to record ethnographic and 
linguistic data on Nicobarese people and their 
languages in 1876. Yet data on the Shompen have 
always been rare. In a small book which appeared in 
2003, Subhash Chandra Chattopadhyay and Asok 
Kumar Mukhopadhyay made a considerable body of 
Shompen data available for the first time. All these 
data were studied in Amsterdam this year by Roger 
Blench of Cambridge. His comparison of the 
Shompen data with Nicobarese and Austroasiatic 
lexical resources is slated for publication in Mother 
Tongue. The new data and comparative study have 
changed our view of Shompen. How can we assess 
the Shompen data and, in particular, various claims 
that have been made about Shompen? Is Shompen a 
Nicobarese language or, indeed, even Austroasiatic? 

Is Shompen a language isolate? What are the possible 
implications of the Shompen data for ethnolinguistic 
prehistory? What new questions do these data compel 
us to address? 

Introduction 

Recently, Roger Blench rendered the valuable service of 
making a newly available Shompen data set more widely 
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accessible. On the basis of those new data, Blench put forward 
the new and interesting idea that Shompen might represent a 
language isolate. Here a modicum of other newly available 
Shompen data collected by the late Elangaiyan is made more 
widely accessible. The earlier conjecture concerning the 
independent phylogenetic status of Shompen, however, is 
called into question. The view presented here is that Shompen 
is still just likely to be another language of the Nicobarese 
subgroup within the Nico-Monic branch of Austroasiatic. 

The Nicobars and Austroasiatic 

The Nicobars form an archipelago between the Bay of Bengal 
and the Andaman Sea, located to the south-southeast of the 
Andaman Islands and just north-northwest of the northern tip 
of Sumatra. Whereas the languages of the Andamans have no 
known linguistic relatives anywhere else in the world, the 
Nicobarese languages constitute a sub-branch within the Nico- 
Monic or Southern Mon-Khmer branch of the Austroasiatic 
language family, as shown in Diagram 1. The Mon-Khmer- 
Kolarian language family was first recognised in the middle of 
the 19th century by Francis Mason (1854, 1860) and renamed 
Austroasiatic at the beginning of the 20th century by the 
Austrian Jesuit priest Wilhelm Schmidt (1904, 1906). 

The languages of the Nicobarese subfamily are spoken by a 
little over 20,000 people on the Nicobar Islands. The specialist 
literature contains Nicobarese language names that generally 
resemble the names provided by Heinz-Jiirgen Pinnow (1959). 
Recently, a research group led by V.R. Rajasingh conducted a 
pilot study in 2002 which identified new language names and 
has grouped together as ‘dialects’ related speech varieties. 1 In 
the northern portion of the archipelago, Pu: or Pu is spoken on 
Car Nicobar Island, and Totet or Sanenyo is spoken on Chowra 
Island. Toihlor) or Luro is spoken on Teressa Island, and the 
closely related Po:9hot or Poahat is spoken on Bompoka Island. 
The 2002 study considers Poiohst to be a dialect of Luro. 
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The four speech forms spoken in the central portion of the 
archipelago, on the islands of Nancowry, Camorta, Trinkut and 
Katchall, are identified by the new survey as representing four 
dialects of a single language. Rajasingh refers to this language 
as Muot, with Muot proper being spoken on Nancowry Island. 
Pinnow refers to the language spoken on the islands of 
Nancowry and Camorta as Nancowry or Na:pkouri, whilst the 
new survey assigns a distinct dialect name, viz. Kinlaka, to the 
Camoita isiand dialect. La:fu:l or Laful is spoken on Trinkut 
Island, and Te:hjiu or Tehnyu is spoken on Katchall Island. 

In the south of the Nicobar archipelago, Lo’ori or Takaha- 
nyilang is spoken along the coast of Great Nicobar Island. The 
2002 survey groups together the forms of speech on the islands 
of Milo, Condul and Little Nicobar as dialects of a single lan¬ 
guage called Lamongse, with Lamongse proper being spoken 
on Little Nicobar and Condul. Pinnow, however, distinguished 
under the name Op the distinct variety spoken on Little 
Nicobar Island, and reserved the term LaimorjJ'e for the lan¬ 
guage of Condul. Miloh or Pihouny is spoken on Milo. Distinct 
from all other Nicobarese languages is J'ompe or Shompen, 
spoken in the hinterland of Great Nicobar Island. 


1 counted war iNicooarese, y22 natives 
of Chowra, 702 Nicobarese on Teressa Island, a total of 1,095 
natives on the central portion of the archipelago, with just 192 
Nicobarese in the southern portion of the archipelago, in 
addition to 348 Shompen in the interior of Great Nicobar 
Island, giving a total native Nicobarese population of 6,310, 
excluding the 201 foreign traders then registered on the islands 
(Temple 1903, III: 142). Eighty years later, the 1981 census 
enumerated a total of 20,940 native Nicobarese plus 223 
members of the Shompen tribe (Singh 1988: 60). Of these 223 
Shompen, 46 were registered as ‘workers’, and 44 were 


recorded as being engaged in hunting and fishing. There were 
reportedly four literate Shompen men and two literate women. 
Recently, Singh leported that the major concentration of 
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Shompen was currently located ‘at a distance of 27 kilometres 
from Campbell Bay on East West Road’ (1994a: 1076). The 
Boxing Day Tsunami of 2004 disastrously affected the demo¬ 
graphy of all Nicobarese language communities. 

1000 AD 0 AD 1000 BC 2000BC 3000 BC 4000BC 5000BC 


Korku - 

Khenvarian 


Wiaria-Juang 

Koraput - 

Wiasian - 

Pakanic - 


Eastern Palaungjc - 
Western Palaimgic 

Winnie -- 

Vatic —--— 


Eastern Katuic- 
Westem Kaluic 


Western Bahnaric ■ 


Northwestern Baluiaiic 
Northern Bahnaric — 
Central Bahnaric 


Soutlieni Bahnaric 
Wimeric — 

Pearic- 

Monic- 


ic — 


Nortlieni Ariian 
Scnoic - 


Soutlieni Aslian 
Nicobarese- 


Minda 


Wias-Kluinic 


Vieto-Katuic 


WiineroVietic 


Wunero-Bahnaric 


Asli-Monic 


Mon-Wunei' 


Nico-Monic 


Diagram 1: Diffloth’s (2001, 2005) Austroasiatic languages family 
tree with his tentative calibration of time depths 


Early and Recent Glimpses of the Shompen Language 

Early Nancowry dictionaries and word lists of other 
Nicobarese languages were first compiled by two men of 
markedly different backgrounds, i.e. the Danish scholar Fre- 
derik Adolph de Roepstorff (.'870, 1875 and posthumously 
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1884) and the Englishman Edward Horace Man (1872 1888 
1889b). Both men recorded data on the Shompen or Shorn Pen 
language. The Shompen are indigenous foragers who reside in 
the hinterland of Great Nicobar Island, and their language has 
always appeared to differ considerably from the other 
languages spoken on the Nicobars. 

Frederik Adolph de Roepstorff 2 was born on the 25th of 
March 1842 at sea on a British vessel sailing from Madras to 
Europe, a circumstance which entitled him to British 
atizenship. He was christened at Cape Town and raised in 

enmaik. After his schooling, he returned to India in 1867 
whereby he made use of his right to be recognised as a British 
citizen to become extra assistant superintendent on the 
Andamans in 1868, and later assistant superintendent of the 
Nicobars in 1877. On the 11th of January 1872, during home 
eave in Denmark, he married Hedevig Christiane Willemoes 
(born 30 November 1843, died 21 August 1896 at 
Copenhagen). He was murdered on the 24th of October 1883 
by the bullet of a captive sepoy on Camorta (Bricka 1900 XIV- 
519-520). His grave lies in ‘the little Camorta graveyard 

where the bluff near the English settlement overlooks the 

beautiful Nancowrv harhcmr pr ,A r 

me nc-oLimg iiuis or me nat¬ 
ives whom he loved so well’ (Chard 1884: i). 

Edwaid Horace Man was born in Singapore on the 13th of 
September 1846 and educated in England. He first arrived at 
Port Blair in the Andamans in 1871 in order to take up 
employment as an assistant superintendent under his father 
Henry Stuart Man. Edward’s elder brother A.C. Man had 
preceded him in 1869 and had already compiled a first 
Andamanese word list, although this elder brother would later 
be killed in Burma. During his many years in the Andaman 
and Nicobar archipelagos, Edward Horace Man authored 
numerous Andamanese and Nicobarese linguistic studies, 
rter his long service in the Nicobars and Andamans, he 
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enjoyed three decades of retirement in Brighton before dying 
of an illness on the 29th of September 1929. 

Before Frederik de Roepstorff and Edward Horace Man, 
data on Nicobarese languages were collected sporadically. As 
early as 1778, Fontana (1792) recorded the very first short 
Nicobarese word list, and David Rosen (1839), a Danish pastor, 
published 63 Nancowry words and the Nancowry numerals. 
Frederik de Roepstorff provides a good account of much 
earlier and contemporaneous fieldwork on the Nicobars, but de 
Roepstorff remains the first scholar ever to have collected 
Shompen data. He held the Shompen or ‘Shobaengs’ to be ‘the 
aborigines of the Nicobars’. He reported that ‘The Shobasngs 
at Great Nicobar are hostile to the Nancowry people who re¬ 
side along the coast, and not long ago a coastman was killed by 
them. This happened in December 1872’ (1875: 2-3). 

In contrasting his impressions of the Shompen as opposed 
to the coastal Great Nicobarese, Edward Horace Man seconded 
de Roepstorff’s opinion that the Shompen represented the true 
aboriginal population of the Nicobars. 

The Shorn Pen have been — and I believe with good 
reason — accepted as the pristine indigenes, and their 
remote origin and purity of breed is apparently beyond 
question, while the various sections of the coast tribe, 
although differing from each other according to external 
influences and other circumstances, are without doubt 
descended from a mongrel Malay stock, the crosses 
being probably in the majority of cases with Burmese, 
and occasionally with natives of the opposite coast of 
Siam, and perchance also in remote times with such of 
the Shorn Pen as may have settled in their midst; the 
fact that the Shorn Pen present Mongolian affinities 
would thus to some extent account for the frequent 
occurrence of the oblique eye in a more or less marked 
degree throughout the group. (1889a: 365-366) 
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Frederik de Roepstorff described how he had been 
fortunate enough to see one of these Shobasngs. He was a big, 
strong youth, nearly as well built as those of NancowryT 
Based on his observation of the phenotypes, he developed a 
theoiy that the modern Nicobarese or ‘Nancowry race’, who 
‘inhabit Trinkut, Nancowry, Camorta, Katchall, Car Nicobar 
and the coasts of Little and Great Nicobar’, had largely 
replaced the original inhabitants of the Nicobars, who had been 
attacked and driven away from the best places, and a remnant 
of them is now found in the interior of Great Nicobar and on 
the little isolated island of Schowra’ [i.e. Chowra, just north- 
northwest of Teressa island] (1875: 3-4). Roepstorff managed 
to collect only a few words’, he reported, ‘as it was not easy 
matter to obtain them from my Shobaeng acquaintance’. 

In fact, de Roepstorff recorded 329 words or expressions in 
the language of the Shobamgs’ or ‘inland race’ in addition to 
the Shompen numerals from one through ten. His comparative 
Nicobarese list contains many more items from the languages 
of Nancowry, Car Nicobar and Teressa Island and the Great 
Nicobar coastal dialect spoken by a language community of 
the ‘Nancowry race’. Later, Edward Horace Man, in his 1889 
Nancowry dictionary, included 237 Shompen words, 
expressions and the numerals in an appendix entitled 
Comparative List of Words in Common Use in the Six 
Dialects of the Nicobar Group’. At the time, Man estimated the 
population of the Shompen to be ‘say 750-1000’. 

After the pioneering work of de Roepstorff and Man, no 
new linguistic data were seen Lorn Great Nicobar Island for 
over a century. 4 Then in a small book which appeared in 2003, 
two Bengali linguists Subhash Chandra Chattopadhyay and 
Asok Kumar Mukhopadhyay made a considerable body of new 
Shompen data available. The new field research yielded a 
harvest of 723 Shompen words, 18 phrases and 23 sentences. 
A copy of this raie publication was brought to Europe in the 
spring of 2007 by my colleague and old friend Suhnu Ram 
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Sharma, who lent it to Laurie Reid, likewise a visiting scholar 
at Leiden, and through Laurie also to Roger Blench of Cam¬ 
bridge. The new Shompen data were studied in Amsterdam by 
Roger Blench, and his comparison of the Shompen data with 
Nicobarese and Austroasiatic lexical resources has now ap¬ 
peared in print, viz. Blench (2007). The new Shompen data 
were also made available to Gerard Diffloth, who assessed 
them against the earlier Shompen data and his own com¬ 
parative Austroasiatic database. 

In addition to the new data published by Chattopadhyay and 
Mukhopadhyay, unpublished material was collected by the late 
Rathinasabapathy Elangaiyan, who passed away on 18 January 
2008. Elangaiyan undertook some eight to nine trips to the 
Nicobars since 1983 until just before the tsunami in 2004, 
staying for sojourns which varied in duration from two to four 
months. His main focus was the Pu language of Car Nicobar 
Island, but he also undertook to investigate the Shompen 
language in the interior of Great Nicobar Island. Elangaiyan 
visited the Shompen twice. Elangaiyan stayed at the Shompen 
Hut Complex, a collection of a few huts set up by the 
government to serve as the site for a health post and food 
distribution centre. There has never been a physician or any 
health workers permanently on duty at the hut complex, 
however. 

On his first visit, Elangaiyan arrived at the hut complex 
with the assistance of porters which he had hired. Elangaiyan 
camped at the Shompen Hut Complex alone. Heavy rains 
ensued, and later he was stricken with Plasmodium vivax 
malaria. His condition and the water-logged terrain prevented 
him from leaving the site. During his illness and convalescence, 
the Shompen regularly visited him, and Elangaiyan conducted 
his first fieldwork whilst being tended and looked after by the 
helpful and friendly Shompen. After more than one and a half 
months at the hut complex, a small number of naval people 
came to the site for a picnic and stumbled upon Elangaiyan. 
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They sent back a message to the township and evacuated the 
much weakened Elangaiyan. 

On his second visit, Elangaiyan again stayed at the 
township for a period of two and a half months. Elangaiyan’s 
corpus of reliable data is scanty, he told me, because a mo¬ 
nolingual approach without any contact language severely 
limits a linguist’s ability of ascertaining the precise meaning of 
taiget language forms. The fieldwork was consequently beset 
with difficulties in ascertaining a precise description of the 
meanings. The fact that the Shompen at the hut complex are 
monolinguals also appears to have adversely affected the 
quality of the new data set provided by Chattopadhyay and 
Mukhopadhyay, whose fieldwork was subject to the same 
limitation. Elangaiyan reported that his knowledge of Pu, the 
language of Car Nicobar, was only somewhat helpful to him in 
dealing with the Shompen. 


Elangaiyan prepared the native language primers for Pu, i.e. 
Car Nicobarese, used in mother tongue instruction. These are 
sound pedagogical textbooks. Likewise, the Shompen language 
primer is based mainly on Elangaiyan’s fieldwork, and he is 


mentioned as a co-author in the produced primer However 

FI D n ctpiupn qc r^r-\+ nil J ^ i •, n .* 

at an piCaScu wan me quality or tne 
Shompen primer. He had strong reservations about the 


Shompen language primer even before its publication because 
his fieldwork data, though valuable, were intended for 


scholarly consumption by linguists only, with qualifications 
about specilic uncertainties regarding certain forms and 


especially meanings. Nonetheless, administrative exigencies 
compelled the hasty publication of the Shompen primer. The 
Pu primers, entitled Ro Tarik 1 and Ro Tank 2, appeared in 
1985 and 1987 respectively, published in DevanagarT script by 
the Central Institute of Indian Languages at Mysore. The level 
1 primer, entitled Shompen-Hindi Bilingual Primer Sompen 
BhdratT 1, written in DevanagarT script, appeared in 1995, 
jointly published by the Central Institute of Indian Languages 
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at Mysore and the Tribal Welfare Department of the Andaman 
and Nicobar Administration at Port Blair. The Shompen 
primer opens with the following words, authored by V. 
Gnanasundaram and M.R. Ranganatha of the Central Institute 
of Indian Languages at Mysore: 

The Shompens are still a shy people who feel 
uncomfortable in the company of outsiders and at the 
first opportunity escape into the jungle. They never 
allow outsiders to know where they live. Their villages 
and homes are beyond the reach of outsiders. 

Gerard Diffloth and I looked at his copies of these 
Nicobarese primers. The Shompen primer data consist of the 
following 70 items: la:?0 ‘fish’, ka:?av ‘rain’, 0:k?a:t ‘girl’, 
k0:v ‘dog’, kag0y ‘stars’, kayay0y ‘parrot’, pa:?a ‘breadfruit’, 
hna?u ‘pig’, mi:?i ‘owl’, ce?0?i: ‘black’, cetiyu ‘red’, ni:yi 
‘mouse’, hiv ~ hi:v ‘sun’, giya:v ‘scorpion’, papvo ‘bamboo’, 
ph0p0: ‘beehive’, hm0p0y ‘snake’, paduvi ‘hoe’, j0va:k 
‘spider’, lovvu ‘necklace or bracelet’, thlovvu ‘stone, rock’, 
ce:tuvi ~ ceithuvi ‘old man’, tao.yc. ‘cockroach’, dce:diyav 
‘woman’, ba:pa:y ‘papaya’, o:k?a:y ‘infant’, cyo:y ‘macaque’, 
tyo:y ‘bread, taro, potato’, do:?o: ‘hill’, d:0 ‘mosquito’, valid: 
‘branch’, eha: ‘root’, md:0:v ‘butterfly’, okhlam ‘man pointing 
with both index fingers to the sides of his head’, niyo ‘house’, 
Ma:tayo ‘housefly’, 0yd ‘bat’, ceylev ~ ceyayov ‘centipede’, 
pa:niyo: ‘log’, pa:tipyu ‘tree’, agaynhyd: ‘cloud’, cehyu: 
‘pigeon’, nu:yi ‘squirrel’, pma?d:v ‘frog’, tydvgo: ‘beach, 
sand’, togh0y0 ‘mango’, bowu ‘sprout’, tomh0ya:v ‘coconut’, 
po:?a ‘eagle’, 1.0v ‘thigh’, miy0v ‘cheek’, to:y iip’, nap ‘ear’, 
ipa:yahi ‘chin’, hma:n ‘eyes’, hiy0hp ‘anklebone’, nuva:n 
‘neck’, kumam ‘forehead’, hog?a:y ‘waist’, ugiy0v ‘fingernail’, 
iya:i ‘tongue’, l0g0:v ‘crab’, pahd: ‘leaf, m0:?0y ‘banana’, 
omiyo: ‘cat’, tig?a:k ‘gaviyal’, op?a:k ‘lead’, phayayov ‘red 
ant’, h0gvo: ‘sea’, k?a:y ‘moon’. 

The romanisation here is a transliteration of the Devanagari 
orthography specifically developed for the Shompen primer 
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and IS based on the phonetic explanations provided on two 
unnumbered pages m the introduction. We have made a 
number of transcriptional decisions. For example, the phonetic 

newlv°de Cffi Vn nd W h3Ve b6en imr ° duCed t0 transliterate 
newly devised Devanagar! vowel signs, and a vowel that might 

in act be some central vowel has been transliterated here from 

, LT 1 "t DeVanagarT orth °g ra Phy as [0], in strict adherence 
with the description provided in the front of the primer The 

primer gives the Shompen words for ‘sun’, ‘centipede’ and 

o man in two different Devanagarl spellings. The meaning 

of some words was difficult to ascertain on the basis of the 

accompanying illustration alone. Although Elangaiyan stressed 

nrehability of the data in this primer and the possibility of 

intra-Nicobarese loans in the data, Gerard Diffloth observed 

that it is nonetheless easy, even upon casual observation, to 

spot several well-known Nicobarese and Mon-Khmer etyma 

1 effected in the data culled from this Shompen primer, e.g. nan 

ear , /or thigh , niyo ‘house’, tomh0ya:v ‘coconut’. 

Observations Regarding the Shompen Material 

Other than the Shompen primer and Elangaiyan’s unpublished 
notes, the Shompen material comprises three di stinct data 
sets. The early material consists of the 339 ‘Shobsng’ words 
or expressions, including the numerals from one to ten that 
were published by de Roepstorff in 1875 and the 237 ‘Shorn 
: *° rds ’ expressions and numerals published by Man in 

1889 Man reported that the name ‘Shorn Pen’ was the coastal 
Great N.cobarese term for the inland people, consisting of the 
element shorn, signifying ‘people’ or ‘natives’, and pen the 
pioper name of a tribe, pronounced like French pain. The 
Shompen themselves, according to Man, referred to 
themselves as Shab Daw'a (1886: 432). The third data set, pre- 

~ nt ® d m i°° 3 by the two Ben S ali linguists Subhash Chandra 
Chattopadhyay and Asok Kumar Mukhopadhyay, comprises 
Shompen words, 18 phrases and 23 sentences. 



On Shompen: 


235 


Impressions of Shompen phonology can be gleaned from 
the available material. Frederik de Roepstorff’s notation 
distinguished a ~ a, and perhaps this orthographic distinction 
denoted two distinct vowels, viz. /a/ vs. /a/, in accordance with 
Indological convention. His notation also differentiated e ~ e 
and o ~ o. These distinctions suggest a possible length contrast 
or tense vs. lax opposition. Similarly, Man’s notation differen¬ 
tiated the Shompen vowels a ~ a ~ a and also made the 
distinctions e~e, i ~ T, o ~ 6 ~ o and u ~ u. Chattopadhyay and 
Mukhopadhyay describe Shompen as having seven or eight 
vowels /i, e, e, a, a, o, o, u/, depending on what we are inclined 
to think about the contrast represented as a ~ a. All eight of 
these vowels can reportedly be nasalised. Due to font 
difficulties, Chattopadhyay and Mukhopadhyay use capital E 
for Shompen /s/ and capital O for the vowel /of. Blench takes 
Chattopadhyay and Mukhopadhyay’s account at face value and 
accepts that their orthographic distinction a ~ a as representing 
a length contrast, whilst I am inclined not to exclude the 
possibility that what the two authors mean by ‘phonemic 
length’, restricted to just this one Shompen vowel, might very 
well just represent two vowels of an altogether different timbre. 

The Shompen consonant phoneme inventory according to 
Chattopadhyay and Mukhopadhyay comprises the phonemes /?, 
k, kh, g, gh, r), c, j, ji, t, th, d, n, p, ph, b, bh, m, y, y, 1, w, <j>, x, 
h/. Shompen purportedy lacks a phoneme /dh/, analogous to 
Shompen /gh/ and /bh/. Shompen has no sibilants, but has the 
fricatives /$/ and /x/. Shompen has a phonemic glottal stop. In 
the notation used by Blench, Chattopadhyay and Mukhopa¬ 
dhyay’s symbols ?, ri and n have been replaced by the more 
current phonetic symbols ?, p and ji respectively. 

In evaluating the Shompen lexical material, the differences 
between the three data sets is the first observation to which any 
close scrutiny will lead. Chattopadhyay and Mukhopadhyay’s 
(2003) data set resembles that of Man (1889b), but neither 
Chattopadhyay and Mukhopadhyay nor Man very closely re- 
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sembie de Roepstorff s (1875) data set. At the same time, the 
selection of lexical items reflected in the material collected by 
Chattopadhyay and Mukhopadhyay appears to be somewhat 
imbalanced. There are two likely causes to which these 
discrepancies might be attributed. 

First, Man observed that Shompen is not so much a single 
language as an internally diverse group of inland dialects, with 
each community possessing ‘a dialect more or less distinct, but 
this is what might reasonably be expected when we consider 
the isolation of the several encampments, and the difficulties of 
intercommunication, apart even from the hostile relations in 
which they stand towards one another’ (1886: 449). Man 
lemarked in particular that the dakan-kat dialect 5 of Shompen 
spoken near Kashindon on the west coast exhibited a high 
degree of lexical divergence from the Shompen spoken at 
Lafal and Ganges Harbour (1886: 448). 

Over a century later, Chattopadhyay and Mukhopadhyay 
too reported two groups of Shompen. One Shompen 
population is a semi-nomadic hunter-gatherer group ‘living in 
deep forests in the northern and the central parts of the island 
around the Galathia and the Alexandria rivers’. They barter 
jungle produce for food and also receive food and medical care 
through a government welfare programme. They hunt with 
spear and are reportedly unfamiliar with bow and arrow. The 
other Shompen group lives on the east coast of Great Nicobar 
where they ‘are in better contact, especially with the local 
Nicobarese tribe’. The eastern coastal group speak some Lo’orj, 
i.e coastal Great Nicobarese, and some of these Shompen also 
un erstand Hindi and frequent the government offices at 
Campbell Bay. Chattopadhyay and Mukhopadhyay reportedly 
collected their data ‘from the last week of December 2000 up 
to the 1st week of February 2001 ’ from the semi-nomadic deep 
forest group at the Shompen Hut Complex, located 27 km from 
Campbell Bay on the East-West Road. The authors assert that 
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these deep forest Shompen never go to Campbell Bay (2003: 
1-3). 

Secondly, an impression which Gerard Diffloth and I 
shared when studying the 2003 data set is that another cause 
for the discrepancy between the three available data sets might 
be a fieldwork problem especially affecting the most recent 
study. It is unclear which contact language the researchers 
used with the reportedly monolingual and shy Shompen and 
what consequences this difficult fieldwork situation may have 
had on the quality of the data elicited. Chattopadhyay and 
Mukhopadhyay record the Shompen pronominal forms io ~ iho 
T, ca ‘my’, email ‘we’ (dual exclusive), eo ‘we’ (dual 
inclusive), eo ‘he’, ond ‘his’. Yet the data set contains no 
words for ‘we’ in the plural (vs. the dual), nor does the 
glossary contain any second person pronominal form. How¬ 
ever, the authors record three utterly different words for 
‘vagina’, i.e. ipuddo, ugdu, totoghdb. Also, Shompen 
purportedly has a lexicalised expression yidi igoki, glossed by 
Chattopadhyay and Mukhopadhyay as ‘dismatting’ (2003: 37), 
an unfamiliar, possibly administrative term which can also be 
found on a few Keralan and Bengali websites. 

The new data set by Chattopadhyay and Mukhopadhyay 
provides the Shompen form koceorj for ‘cat’, a Malay loan 
word found throughout the Nicobars, but Frederik de 
Roepstorff recorded an abbreviated form tjing for Shompen 
‘cat’. It is conceivable that the truncated form was the earlier 
loan which Shompen acquired from Lo’or) or Coastal Great 
Nicobarese, and that the word was subsequently loaned again. 
Nandan (1993: xx) records the Coastal Great Nicobarese form 
kuching ‘cat’. Finally, Chattopadhyay and Mukhopadhyay re¬ 
port that syntactically the basic syntactic element order of 
Shompen is verb-subject-object (VSO). 

Chattopadhyay and Mukhopadhyay’s data set is therefore 
problematic, and a comparative study based on the 2003 data 
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set led Roger Blench to conclude that Shompen has ‘no 
obvious relationship with other Nicobarese languages or other 
Mon-Khmer languages’. Blench goes on to speculate that: ‘As 
with the Andamans, the possibility that the Shorn Pen 
represent a relic of early human expansion around the rim of 
the Indian Ocean should be seriously considered’. Is Shompen 
then not Austroasiatic at all and therefore perhaps a language 
isolate of South Asia like Nahali, Vedda, Kusunda or Buru- 
shaski? Have the new data changed our view of Shompen? 
What are the possible implications of the new Shompen data 
for ethnolinguistic prehistory? 6 

Only a thorough holistic description of the language can re¬ 
solve such uncertainties. New work on Shompen urgently 
needs to be undertaken by a gifted and dedicated field linguist 
willing to brave the dangers of malaria and the discomforts of 
conducting fieldwork at the Shompen Hut Settlement. There a 
linguist could take up the challenge of conducting arduous 
work with monolingual Shompen speakers. Also, new 
comparative tools such as S-ampe’s Munda database and 
Shorto’s (2006) comparative Mon-Khmer dictionary are now 
available. Diffloth (2008) should be carefully consulted, 
however, ^ before considering using Shorto (2006) as a 
reference. At the same time, new data on Nicobarese 
languages have been provided in several studies, e.g. 
Whitehead (1925), Radhakrishnan (1981). 

Meanwhile, we can best trust Gerard Diffloth’s assessment 
of the more reliable earlier Shompen data collected by 
Frederik de Roepstorff and Edward Horace Man in light of his 
comparative Austroasiatic database. Diffloth assesses that ‘out 
of 222 Shompen lexemes, 109 have cognates with other 
Nicobarese languages’, whereas ‘102 have no identifiable 
cognates , and 7 have South Mon-Khmer cognates not found 
in other Nicobarese languages’. Two of the 222 lexical items 
can be identified as borrowings from Malay. Out of the 109 
shared Nicobarese etyma in Shompen, 57 also have good 
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Southern Mon-Khmer cognates. The seven Shompen lexical 
items that have no Nicobarese cognates but are shared with 
other South Mon-Khmer or Nico-Monic languages are toak 
‘afraid’, hohom ‘bathe’, al0v ‘pig’, chuk ‘foot’, kateap ‘egg’, 
kakoay ‘sit’ and kamyak ‘husband’. Gerard also points out that 
Shompen has undergone a regular sound change, whereby 
Austroasiatic final nasals, retained as final nasals in 
Nicobarese and most mainland Mon-Khmer languages, are 
reflected as devoiced stops. This fact indicates that such good 
Austroasiatic roots cannot have been borrowed from mainland 
Mon-Khmer languages, and that Shompen is a language 
belonging to the Nicobarese branch, not a language isolate 
(Diffloth 2007). 

The Physical Anthropology of the Shompen 

Even in the old physical anthropology of frizzy hair and 
phenotypes, the somatological affinities of the Shompen were 
a heated topic from the start. The proximity of the negrito 
populations of the Andamans in conjunction with the idea that 
the inland Shompen represented some aboriginal remnant 
group suggested to the minds of many that the Shompen too 
were a negrito people. Frederik de Roepstorff was the first to 
assail the then widely held view that the Shompen were a 
negrito population. He maintained that the Shompen were of 
‘Mongoloid’ stock. Some resisted this idea, preferring to 
entertain the view that the Shompen were of ‘Negrito stock, 
allied to the Andamanese or the Semangs of the Malay 
peninsula’ (Distant 1879: 336)7 

A detailed old-fashioned physical anthropology of the 
Nicobarese peoples is provided by Man, who noted that the 
‘characteristic tint’ of the Shompen was ‘a dull brown’ lacking 
‘the healthy appearance which distinguishes the coast people’ 
(1889a: 390). The ossuary practices on the islands of Bompoka 
and Teressa suggested to Bonington early cultural contacts 
with Melanesians or, in his own words, ‘the existence of a 
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strong Melanesian element in the Nicobars in spite of their 
Mon language’ (1926: 106). Studies such as Ball (1881), Man 
(1889a), Boden Kloss (1903) and Meerwarth (1919) contain 
inteiesting descriptions and valuable photographic 
documentation of the Nicobarese people and their architecture. 
Recent accounts of the Nicobarese in their current circum¬ 
stances, sometimes including pictorial documentation, are 
provided by Agarwal (1967), Dagar and Dagar (1999), Krishan 
(1986), Lai (1977), Justin (1990), Nandan (1993) and Rizvi 
(1990). 

The new physical anthropology focuses on molecular 
polymorphisms in the double helices of the chromosomes and 
on the mitochondrial DNA. Recently some molecular genetic 
work has been done on the Shompen. Twelve Shompen males 
were sampled in a study, and all were found to bear the 02a 
(M95)^ haplogroup on their Y chromosome (Trivedi et al. 
2006). 9 This single nucleotide polymorphism has been 
identified as a possible marker for a paternal lineage reflecting 
an ancient male-driven spread of the Austroasiatic language 
family (van Driem 2007). 10 In fact, the correlation of linguistic 
and population genetic findings has suggested that many 
language communities speak father tongues rather than mother 
tongues. Languages and entire language families appear often 
to have been disseminated by male speakers. 

The widespread nature of the correlation of language with a 
few predominant Y haplogroups suggests that it must have 
been a recurrent motif in ethnolinguistic history that mothers at 
one point in time were compelled to raise their children in the 
language of the fathers. Based on the work of Estella Poloni 
and her teammates (1997, 2000), this phenomenon, which I 
called the ‘Father Tongue hypothesis’ in Taipei in 2002, has 
consequences for the way historical linguists will in future 
have to think about language change. This phenomenon also 
opens up the question of whether the sexual dimorphism in our 
species with respect to linguistic abilities and language 
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sensibility could have its evolutionary origins in the dynamics 
of warfare, competition and linguistic assimilation between 
rival language communities in an ancestral age. 

Trivedi et al. (2006) do not specify other single nucleotide 
polymorphisms (SNPs) which they may have typed that might 
have distinguished different lineages within the clade. This 
would have been helpful, for we have more recently come to 
know that the 02a (M95) haplogroup can be subdivided into 
02a*, bearing only the M95 mutation, and 02ala (PK4) and 
02al* (M88, Mill). In their study, the short tandem repeats 
(STR) within the 02a haplogroup suggested a greater affinity 
between the Shompen and the Munda than with other 
Nicobarese, and the greatest distance to Austroasiatic language 
communities of Southeast Asia. However, short tandem 
repeats are highly variable and especially useful as forensic 
markers. Therefore, whilst the STR profile provided by 
Trivedi et al. (2006) is suggestive, the short tandem repeats 
provide no clear-cut picture of affinities and lack monophyletic 
resolution. Trivedi et al. (2006) claim that the Shompen 
represent the ‘descendants of Mesolithic hunter-gatherers’. 
Although their data provide no support for this assertion, it 
may of course be true that most people on earth today happen 
to descend from Mesolithic hunter-gatherers at some time and 
place. 

The mitochondrial DNA of the Shompen is reportedly 
characterised by the two clades B5a and R12. The B5a 
configuration represents a newly identified clade with a coale¬ 
scence age of 17,000 years and geographical distribution 
mainly in insular and littoral Southeast Asia. The ‘R12’ clade, 
which will probably be relabelled ‘R22’ in the newly emergent 
conventional mtDNA nomenclature, is common amongst other 
populations native to the Nicobars and represents a lineage 
which is also seen in Vietnam, Indonesia, the Philippines and 
on Taiwan. In short, the population genetic data can be seen as 
corroborating to some extent the linguistic view that we have 
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of Nicobarese as a branch of Austroasiatic, though, of course, 
population genetic data should not necessarily be expected to 
do so. The newly developed autosomal markers have yet to be 
tested on the Shompen, other Nicobarese peoples and 
Austroasiatic language communities. 

Linguistic Palaeontology and the Austro-asiatic Homeland 

In addressing the question of the precise whereabouts of the 
Austroasiatic ancestral homeland from a purely linguistic point 
of view, the two foremost criteria in our deliberations are the 
findings of linguistic palaeontology and the geographical 
centre of gravity of the language family based on the dis¬ 
tribution of modern Austroasiatic language communities and 
deep phylogenetic divisions in the family. Then these 
inferences can be critically assessed in view of relevant 
information from other fields such as archaeology and 
population genetics. The distribution of the modern language 
communities and the geography of the deepest historical 
divisions in the family’s linguistic phylogeny would put the 
geographical centre of the family somewhere between South 
Asia and Southeast Asia, in the area around the northern coast 
of the Bay of Bengal. 

Gerard Diffloth pointed out in his keynote address on 
‘Considerations of the homeland of Austroasiatic’, with which 
he inaugurated the 3rd International Conference on 
Austroasiatic Linguistics (ICAAL 3) at Deccan College on 26 
November 2007, that nobody knows the higher-level nodes of 
Austroasiatic for sure, which leaves the question of the earliest 
branchings undetermined. If the deepest division in the family 
lies between Munda and the rest, as an older generation of 
scholars used to suspect, then the geography of deep historical 
divisions in linguistic phylogeny would compel us to look for a 
homeland on either side of the Ganges delta, although we 
would be unable to say precisely whether this homeland would 
have to have lain to the east or to the west of the delta. If we 
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assume the veracity of Diffloth’s new tripartite division, shown 
in Diagram 1, the geography of the deepest phylogenetic 
divisions within Austroasiatic would likewise suggest a 
homeland in this region. 

Linguistic palaeontology, a term introduced by Adolphe 
Pictet in 1859, is an attempt to understand the ancient material 
culture of a language family on the basis of the lexical items 
which can be reliably reconstructed for the common ancestral 
language. The linguistic palaeontology of Austroasiatic 
strongly qualifies the ancient Austroasiatics as the most likely 
candidates for the first cultivators of rice. At the same time, 
Diffloth has shown that the reconstructible Austroasiatic 
lexicon paints the picture of a fauna, flora and ecology of a 
tropical humid homeland environment. 

Diffloth (2005: 78) has shown that three salient isoglosses 
diagnostic for the faunal ecology of the Proto-Austroasiatic 
homeland can be reconstructed all the way to the Austroasiatic 
level and are reflected in all branches, including Munda, i.e. 
*mra:k ‘peacock Pavo muticus', *torkuot ‘tree monitor lizard 
Varanus nebulosus or bengalensis' and *tonyu:? ‘binturong’ or 
the ‘bear cat Arctitis binturong ’, a black tropical mammal that 
is the largest of the civet cats. All of these species are not 
native to areas that currently lie within China, and, to our 
present knowledge, these species never were native to the area 
that is today China. More reconstructible Proto-Austroasiatic 
roots indicative of a tropical or subtropical climate are adduced 
by Diffloth (2005: 78), i.e. *(bsn)jo:l ~ *j(orm)o:l ‘ant eater, 
Manis javanica' , *dokan ‘bamboo rat, Rhizomys sumcitrensis' 
(an Austroasiatic root which has found its way into Malay as a 
loan), *kaciar) ‘the Asian elephant, Elephas maximus' , *kiac 
‘mountain goat, Capricornis sumatrensis’ , *roma:s ‘rhinoceros, 
Dicerorhinus sumatrensis ’ and *tonriak ‘buffalo, Bubalus 
bubalus’. 
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Finally, Diffloth (2005: 78) points out a fact long noted by 
scholars of Austroasiatic linguistics, e.g. Osada (1995), namely 
that a rich repertoire of reconstructible roots representing 
ancient lice agiiculture is robustly reflected in all branches of 
Austroasiatic, viz. *(ks)6a:? ‘rice plant’, *ror)ko:? ‘rice grain’, 
*cor)ka:m ‘rice outer husk’, *kondok ‘rice outer husk’, *phe:? 
lice bran , *tompal ‘mortar’, Honre? ‘pestle’, ^jompior ‘win¬ 
nowing tray’, *gu:m ‘to winnow’, *prmuol ‘dibbling stick’ and 
*kontu:? ‘rice complement’, i.e. accompanying cooked food 
other than rice. 

Nicole Revel (1988) contributed one of the most elaborate 
ethnobotanical studies on rice, rice cultivation practices and 
rice terminology in various Asian language communities. The 
other main candidate for early cultivators of rice are the ances¬ 
tral Hmong-Mien. Great strides have been made in our 
understanding of Hmong-Mien historical phonology 
(Haudricourt 1954, Purnell 1970, Wang and Mao 1995, 
Niederer 1998), although the reconstructible lexicon specific to 
rice cultivation is less impressive than the Austroasiatic 
repertoire. The three Hmong-Mien etyma relating to rice 
cultivation that appear to be original to the linguistic phylum 
are *ntso:i ‘husked rice, *na:n ‘cooked rice’ and *njerj ‘rice 
head, head of grain , whereas the Hmong-Mien terms for 
glutinous (rice), (paddy) field, sickle, rice cake and (rice) 
seedling ‘are likely to have had a Chinese origin’ (Ratliff 2004- 
158-159). 

The rice story is complex, and the plot of the story has 
changed more than once in recent decades. Whereas the origin 
of rice cultivation was once held ‘incontestably’ to have lain in 
the Indian subcontinent (Haudricourt and Hedin 1987: 159-161, 
176), subsequent scholarship moved the homeland of rice 
agriculture from the Ganges to the Yangtze. For years of 
conventional wisdom in archaeological circles dictated that 
rice was domesticated in the Middle Yangtze, perhaps as early 
as the sixth millennium BC. 
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More recently, scholars have increasingly begun to take 
note of findings that would move the original homeland of rice 
cultivation back to the Indian subcontinent. Against the 
background of older datings of domesticated rice and ceramic 
culture from Gangetic basin and Doab sites such as Koldihawa 
and Mahagarha, reportedly dating from the seventh 
millennium BC (Sharma et al. 1980, Pal 1990, Agrawal, 2002), 
there are now newer sites with more reliable dates at 
Lahuradewa (Lahuradeva), Tokuva and Sara! Nahar RaT. 

At the Lahuradewa site (26°46 ! N, 82°57’ E), the early 
farming phase, corresponding to period 1A in the site’s clear- 
cut stratigraphy, has radiocarbon dates ranging from ca. 5300 
to 4300 BC. Carbonised material from period 1A was collected 
by the flotation method, yielding Setaria glauca and Oryza 
rufipogon as well as a morphologically distinct, fully 
domesticated form of rice ‘comparable to cultivated Oiyza 
sativa’ (Tewari et al 2002). More recently, accelerator mass 
spectroscopy dates were obtained on the rice grains themselves, 
corroborating the antiquity of rice agriculture at the site. 

Most recently, new radiocarbon dates for rice agriculture 
have been coming from the Ganges basin, with the Tokuva site 
near Allahabad now yielding similar dates (Vasant Shinde 
[Vasant Sivaram Sinde], personal communication 27 
November 2007), and exciting new dates for ancient rice 
agriculture are also emerging from Sara! Nahar RaT (Manjil 
Hazarika, personal communication 7 March 2008). Of course, 
we are living at a time when a more reliable calibration of 
radiocarbon dates in general has become a matter of great 
urgency. At the same time, as Prof. Ram Dayal Munda of 
Ranchi University pointed out in his inaugural address at the 
opening session of the 3rd International Conference on 
Austroasiatic Linguistics (ICAAL 3), the bulldozer effect of 
globalisation in present and former Munda areas is effacing the 
traces of ancient Austroasiatic archaeology and palaeobotany. 
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Further east, at least five species of wild rice are native to 
northeastern India, viz. Oryza nivara, Oryza officianalis (0. 
latifolia), Oryza perennis (0. longistaminata), Oryza 
meyeriana (0. granulata) and Oryza rufipogon, and reportedly 
over a thousand varieties of domesticated rice are currently in 
use in the region (Hazarika 2005, 2006a). The different 
varieties of rice in northeastern India are cultivated in three 
periods by distinct cultivation processes. In the process of dhu 
kheti, the rice is sown in the months of Phdgun and Sot, i.e. 
mid February to early April. The seedlings are not transplanted 
but ripen in just four months in fields which must be constantly 
weeded. In ban kheti, the rice seedlings are sown from mid 
March to mid April in ploughed wet fields and likewise do not 
need to be transplanted. In sdli kheti, the rice is sown from mid 
May to mid June, and the seedlings are transplanted. Sdli kheti 
rice varieties are suspected to derive from the wild officianalis 
rice still widely found in swampy village areas. The wild rufi- 
pogon rice cannot be used for human consumption because the 
plants shed their seeds before they ripen, so that rufipogon rice 
is used in Assam and other parts of northeastern India as cattle 
feed (Flazarika 2006b). 

Whilst claims have been published of rice cultivation in 
East Asia as long as around 10,000 BC, the currently available 
evidence indicates that immature morphologically wild rice 
may have been used by foragers before actual domestication of 

the crop, e.g. at the A+H Bashfdang site (7000-6000 BC) 

belonging to the i^Ulll Pengtoushan culture in the Middle 

Yangtze and at sites in the Yangtze delta area such as StMf 

Kuahuqiao, Majiabang (5000-3000 BC) and 

Hemudu (5000-4500 BC). However, only ca. 5000 BC was the 
actual cultivation of rice probably first undertaken by people in 
the Lower Yangtze, who at the time relied far more heavily on 
the collecting of acorns and water chestnuts (Yasuda 2002, 
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Fuller 2005a, 2005b, 2005c, 2006a, 2006b, 2006c, 2007a, 
2007b, Fuller et al. 2007, Zong et al 2007). There is also cur¬ 
rently no evidence for the co-cultivation of rice and foxtail 
millet along the middle Yangtze until around 3800 BC (Nasu 
et al 2006). 

Today, our understanding of the palaeoethnobotanical 
picture is more complex. The two main domesticated varieties 
of rice, Oryza indica and Oryza japonica, are phylogenetically 
distinct and would appear to have been domesticated 
separately. Oryza indica derives from the wild progenitor 
Oryza nivara and was first cultivated in South Asia or western 
Southeast Asia, perhaps in two separate domestication events. 
On the semi-arid Gangetic plain at the end of the mid- 
Holocene wet period, habitats for wild rices increasingly 
shifted to oxbows as palaeochannels dried up and turned into 
oxbow ponds. This shift favoured monsoonal rather than 
marshland rice species, including Oryza nivara, the wild 
progenitor of Otyza indica (Fuller 2006a). 

Oryza japonica derives from the wild progenitor Otyza 
rnfipogon, and it is currently believed that the rufipogon 
variety was first cultivated to yield early Otyza japonica along 
the Middle Yangtze. Harvey et al. (2006) have critically re¬ 
assessed the morphometries of rice finds associated with 
various Neolithic sites throughout the Yangtze basin in light of 
recent genetic findings. It appears that the wild progenitor 
Oryza rufipogon was not fully domesticated in the Lower 
Yangtze to yield early Oryza japonica until ca. 4000 BC. 
Generally, the archaeological record shows a delay of one to 
two millennia between the beginning of cultivation and the 
first clear evidence of domestication sensu stricto, i.e. genetic 
modification by selective breeding. 

Twelve wild forest-margin rice species are known, found 
mostly in Southeast Asia as well as at old sites of human 
habitation, e.g. Jiahu in the seventh millennium BC or Hemudu 




248 


George van Driem 


in the first half of the fifth millennium BC. Extinct wild 
varieties of rice also appear to be preserved in the modern 
japonica genome. Based on the genetics of the officianalis 
variety, the seasonally wet, puddle-adapted Otyzci nivarci, and 
the always wet perennial Oryza rufipogon, there may be 
evidence for multiple rice domestications in South, Southeast 
and East Asia. So, maybe the domesticators of Oiyza nivara 
weie ancient Austroasiatics, and maybe the domesticators of 
ancient Oryza rufipogon were ancient Ehnong-Mien. 

O’Connor (1995) and Blench (2001) have argued that 
inigated lice agriculture enabled people to seize control of 
lowlands and flood plains. People were able to move down 
from upland areas that had hitherto been more favourable 
habitats after wet cultivation had transformed lowlands from 
epidemiologically undesirable places into bountiful habitats. 
But what if the first cultivators and domesticators of rice 
already inhabited lowland river basins and flood plains, such 
as the Ganges or Yangtze basins or even the Brahmaputran 
flood plains? 

Turning to northeastern India and the Indo-Burmese 
borderlands, we must recognise that, notwithstanding the 

PYPpI 1 r»n t Q **/"> KoOAlAO-ir.nl »-r r , N - 3 _ 1 1 , t 

uivuawiugiLcu wuik uuiiuucicu in me u-anges and 
Yangtze ri\er basins, much of the archaeology of ancient rice 
agiicultuie is simply not known because no substantive 
archaeological woik has been done on the Neolithic in the 
most relevant areas, e.g. northeastern India, Bangladesh and 
Burma. The sheer dearth of archaeological research in these 
areas leaves entirely open the possibility that rice cultivation 
may have originated in this region. We might expect to find 
traces of ancient farming communities better preserved in the 
hill ti acts sun ounding the Brahmaputran flood plains than on 
the fertile fields themselves, although the earliest rice-based 
cultures may first have developed on those very flood plains. 
Pei haps the lemains of the first rice cultivating cultural 
assemblages lie buried forever in the silty sediments of the 
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sinuous lower Brahmaputran basin or were washed out by the 
Brahmaputra long ago into the depths of the Bay of Bengal. 

NOTES 

1. Unless stated otherwise, I first provide the language name given 
by Pinnow (1959) and then the recently introduced language 
name identified in the 2002 pilot survey. I thank V.R. Rajasingh 
for kindly providing me with these newer names from their yet 
unpublished pilot survey report. 

2. The surname has sometimes appeared in print in the orthography 
‘de Roepstorff’. 

3. In a study published in the formerly Danish city of Lund, 
Simron Jit Singh (2003) provides a valuable historical account of 
European dealings in the Nicobars, with special emphasis on the 
Danes, yet somehow he manages to entirely overlook Frederik 
Adolph de Roepstorff. 

4. In 1993, Nandan included a glossary of 137 words and 
expressions from Great Nicobar, including several obvious Indo- 
Aryan loans like ‘chapati’, ‘dal’, ‘ata’ and ‘ghee’. Judging from 
the items, the language documented is Lo’orp the coastal dialect 
of Great Nicobar, not Shompen, e.g. Nandan’s nong ‘ear’ vs. 
Shompen gna, Nandan’s pukoi ‘pig’ (cf. de Roepstorff’s bakoi ) 
vs. Shompen nong, Nandan’s em ‘dog’ vs. Shompen kiip. 

5. The term dakon-kat would appear to denote the ‘ill-adjusted 
loin-cloth’ worn by this group of unkempt Shompen ‘which they 
evidently wear in imitation of the neng of the coast men’ (Man 
1886: 447). 

6. Chattopadhyay and Mukhopadhyay venture an attempt to relate 
Shompen to Tibeto-Burman, Kra-Dai (Daic), Austroasiatic and 
Austronesian. To this end, the only evidence adduced consists of 
three Shompen, Fijian and Samoan lexical items glossed as 
‘canoe’, ‘pandanus’ and ‘coconut’. 

7. In fact, it may not be too late to follow up on Diffloth’s 
suggestion of publishing a photo-facsimile edition of Shorto’s 
original manuscript and notes, just as the Soviet Academy of 
Sciences did belatedly in 1960 with the valuable polyglot notes 
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of the murdered Tangut scholar Nikolaj Aleksandrovic Nevksij 
(cf. van Driem and Kepping 1991, van Driem 1993). 

8. In his recounting of the tale, Roger Blench writes that ‘the fact 
that the Shorn Pen have straight hair, like the Nicobarese, 
brought an untimely end to such speculation’, i.e. the conjecture 
of early ethnographers that the Shompen might represent a 
missing link between the Andamanese and the indigenous 
negiito population groups of the Malayan peninsula. This 
statement is placed underneath a photograph showing at least 
two Shompen men with unmistakably frizzy hair, one of whom 
could even be said to be sporting the coiffure once popularly 
lefeired to as an afro’. Blench hastens to observe, however, that 

the issue of straight hair has been questioned, with some 
populations apparently having wavy hair’. 

9. Some Nicobarese population genetic data were also included in 
recent Andamenese studies, i.e. Thangaraj et al. (2003), 
Thangaraj et al. (2005), Palanichamy et al. (2006). 

10. Kumar et al. (2007) essentially corroborate my interpretation 
of the earlier work on the 02a haplogroup and conclude on the 
basis of M95 ‘that the Mundari populations are one of the 
earliest settlers in the Indian Subcontinent’. The study by Kumar 
et al. (2007) is informative for the Munda groups, though the 
dating is wrong. Their article argues in favour of a hypothesis 
about Austroasiatic origins which is entirely untestable on the 
basis of their sampling, including their speculation that ‘these 
populations have come from Central Asia through the Western 
Indian corridor and subsequently colonized Southeast Asia’. 
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CORPUS-BASED ANALYSIS BASED ON 
BODDING ’S SANTAL DICTIONARY 1 

Mingeshi, Makotof Takashima, Junf, Murmu, Ganesh§ 


ABSTRACT 

Santali, the most populous member of the North 
Munda subgroup of the Munda family, is spoken in 
eastern states in India. The language is said to have at 
least two dialectal groups; northern and southern. 

The main phonemic difference between these dialects 
is a number of vowels: the northern dialect has eight 
or nine vowels whereas the southern dialect has six 
vowels, though dialectal study of the language has 
yet to be elaborated and dialectal boundary is not 
been clearly shown. 

The eight vowel system in the northern dialect is 
based on the Santal Dictionary (1932-1936) 
compiled by Rev. P. O. Bodding. The dictionary is 
well known as it distinguishes subtle differences in 
half-close vs. half-open contrast in both front vowels 
and back vowels. The dictionary is a monumental 
work, providing not onlv descrintive linmiictin Hof.'* 

I-- VIU.LCI, 

but also a rich reference of cultural information. The 
digitalization of the dictionary, started in 1991 as an 
Indo-Japanese joint research project, has been almost 
completed except some proof-editing work. 

The present paper reports an ongoing analysis of 
Santali phonology based on the dictionary. In order 
to decide whether the eight vowel system is really 
phonemic or not, co-occurrence of vowels in 
headwords in the dictionary is closely examined. 

1. Introduction 

Santali, the most populous member of the North Munda 
subgioup of the Munda family is spoken in eastern states in 
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India. The language is said to have at least two dialectal 
groups; southern and northern. The main phonemic difference 
between these dialects is number of vowels: the southern 
dialect has six vowels, whereas the northern dialect has eight 
or nine vowels, though dialectal studies of the language seem 
yet to be elaborated and dialectal boundaries have not been 
clearly shown so far. 

Minegishi & Murmu (2001) is an example of southern 
dialects, as it has a six vowel system, /i, e, a, o, u, 9 /, the last of 
which seems to have been derived from /a/. The description is 
based on the pronunciation of the second author, who was born 
in East Singhbhum District of Jharkhand State, i.e., southern 
part of former Bihar state. 

Rev. P. O. Bodding’s Santal Dictionary (1932-1936), 
hereafter called as ‘BSD’, represents northern dialects, as it 
has eight vowel system /i, e, e, a, a, o, 0 , u/. BSD describes the 
front vowel /e/ as having “several values, the mid-front-narrow 
(like in Norw . fred), the mid-front-wide e (like in Engl, men), 
or the mid-mixed-narrow (or wide) e”, and “/e/, the low-front- 
narrow or low-front-wide sound, pronounced like the vowel in 
Engl, air or dead”. 

As to the back vowels, as Bodding describes, “/o/ is the 
mid-back-narrow-round or the mid-back-wide-round vowel 
sound, something like the sound in u note.” The lips are not 
much protruded”, and “/o/ is the low-back-narrow-round, the 
low-mixed-narrow, or the low-back-wide round sound, long or 
short, like in Engl, law, or not”. 

Despite phonetic varieties, according to the above 
descriotion. basic vowel svstem of BSD in modern phonemic 

1 ' ■» 

symbols would be /i, e, e, 9, a, D, o, u/. 

BSD is a monumental work, providing not only descriptive 
linguistic data, but also a rich reference of cultural information 
of Santali society. In 1990’s, as part of Indo-Japanese joint 
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lesearch project, we decided to digitalize BSD, to preserve 
original information as much as possible. 

The present paper is a preliminary report on BSD 
digitalization. Though inputting the data has been already 
completed, we have found lots of work to be done. 
Proofreading, one of the major tasks, needs some more time. 
We will show what should be done besides proofreading, in 
order to use the data for phonemic analysis, or further 
morphemic studies of Santali. 


2. Structure of the Dictionary 


BSD was reprinted by Gyan publishing house and is still 
available, though due to the lack of preciseness of 
photocopying, many diacritics, which are crucial to phonetic 
and phonemic studies, are not always discernible. Fortunately 
the original version is available in the library of the Tokyo 
University of Foreign Studies, we use it for our study. 


BSD consists of five volumes. The total page number is 
3,406. In BSD, Santali headwords are transcribed into Roman 
alphabets with diacritics and superscripts, arranged in the 
following order, where ’IT indicates an aspirated stop. 


A, A, B, Bh, C, Ch, D, Dh, D, Dh, E, E, G, Gh, H, I, J, Jh, K, 
Kh, L, M, N, Nh, N, O, 0, P, Ph, R, S, T, Th, T, Th, U, V, W, 
Y 


The structure of each entry in BSD is as follows. 

A Santali headword (a compound word, or a phrase) is 
followed by a comma, abbreviation(s) of parts of speech, and a 
period. Its English counterpart is given as the second sentence. 
In order to distinguish romanized Santali and English, the 
former is written in italic characters, the latter, upright ones. 
Santali sentence example(s), if any, follows the above, then 
followed by its English translation. Additional information 
such as etymological, or anything to be referred, is given 
within round brackets in the last of the description. 
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An example of headword and description is given below. 

hako, n. A fish, v, a., v. m. d. catch, get fish; v. m. become 
full of fish. H. 

sapko calaoena , they went to catch fish; h. il, fish fin; h. ko 
b&.r siketkoa, they caught fish by angling; h. ketkoako, they 
caught fish; h. anako tehen , they got fish to-day; 
noagadareko h.k kana, they get fish in this river; noa bandre 
arh~o ~ ko h.yena, fish have again come into this tank; h. 
n~a.h~i danda cele ho balenamletkoa, we did not get any, 
neither fish nor anything. (Sakei, Besisi, Semang, Bahnar, 
Sue, Annam ka\ Khasi kha; Nicobar kde; Mundari, Birhor , 
Ho haku.) 

3. Digitalization of BSD 

To digitalize BSD, we need a set of transliteration rules for 
special diacritical symbols. Following the rules, the above 
entry of BSD is represented as follows: 

< hako >, n. A fish. v. a., v. m. d. catch, get fish; v. m. 
become full of fish. 

< H_0. sap A ko calaoena >, they went to catch fish; [... Usage 
Examples and Translation... ] 

(Sakei, Besisi, Semang, Bahnar, Sue, Annam ka; Khasi < kha >; 
Nicobar < ka=e >; < M_0uNDari, Birho+R >, Ho < haku >.) 

As in the above, words or phrases of Santali, or of other 
languages which need diacritical marks are transliterated using 
numbers and symbols available in a keyboard, and put within 
angle brackets. Other parts, which are not in brackets, are in 
English. 

Tn total QSD pntripQ fwords rnmnrmnH words nr 

phrases) are in BSD. The total file size is 10,678,272 bytes 
(10MB). The whole data has already been input and the second 
proofreading has been almost completed. Part of the data is 
now available via our web site, and the rest will be so in the 
near future. 
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By providing the BSD via internet, information about 
Santali tiaditional culture and society would be accessible 
across the world. Our data will be sufficient for this purpose. If 
we are to use the data, however, in the areas of phonemics and 
morphology, pioofieading should be done again and again, as 
precise lepioduction of BSD spelling would be needed. 

Expected cycle of our research is as follows. 

Step 1. Processing the data to extract syllabic patterns found in 
the dictionary. 

Step 2: If irregularities are found in the data, then go back to the 
oiiginal text to see whether these are really exceptional 
cases or just mistyped. 

Step 3: Re-process the data to check again. 

This cycle of steps will be repeated, which is an 
enormous amount of work, but is necessary to find regularities 
in the Santali phonemic patterns. 

4. Results of the Analysis 

Provisionary agenda for Santali phonology are: 

Ql. How and why are there phonetic difference, six and eight 
vowel systems, among Santali dialects? 

Q.2 Historically, eight vowel system of the northern dialect 
derived from five vowel system, or vice versa? 

Q.3 How aie eight vowels phonetically or phonemically 
conditioned? 

These questions, related to each other, are difficult to 
answer. Now BSD data is available, we can consider the third 
question based on the large amount of data. As an initial stage 
of our linguistic study, headwords are extracted and classified 
to examine phonetic and phonemic condition of each vowels. 
Taking the BSD headwords as an example, we will see what 
we should do now. 
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4.1. Syllable Patterns in BSD Headwords 

By representing a consonant as ‘C\ we extract 3446 syllable 
patterns and count frequencies of each pattern in BSD 
headwords. For example, ‘CaC\ that is monosyllablic /a/ 
flanked with whatever consonants, appears most frequently 
(2542 times). 

Table 1 shows these order in frequency, range of frequency, 
number of patterns of the syllable patterns. 


Order in Fr. 

Range of Fr. No. of Pal 

‘. Order in Fr. 

Freq. 

No. of Pat, 

No. 1-8 

1000 more 

8 

834- 988 

4 

155 

9-24 

999-500 

16 

989-1222 

3 

234 

25-56 

499-200 

32 

1223-1771 

2 

549 

57-97 

199-100 

41 

1772-3445 

1 

1674 

98-171 

99- 50 

74 




172-318 

49- 20 

147 




319-520 

19- 10 

202 




521-833 

9- 5 

313 




(No. 1-833) 

(Subtotal) 

833 (No.834—3445) (Subtotal) 2612 


Table 1. Frequencies of Syllable Pattern 

Hereafter, BSD’s contrast [e] vs. [e] is shown as ‘e’ vs. 
‘e2’, [o] vs [o], ‘o’ vs ‘o2\ respectively. The most frequently 
appeared patterns, Nos. 1 to 8, appear more than 1,000 times. 
Frequency of each pattern is hereafter given within round 
brackets. These are: CaC (2542), CaCaC (2244), Co2Co2C 
(1686), CaCa (1568), CuCuC (1542), CaCCa (1207), CuC 
(1195), Co2C (1173). 

The above result shows that Santali prefers monosyllabic 
and disyllabic patterns, and that in the disyllabic cases, the 
same vowel tends to be repeated. 

Though the whole data should be proofread carefully, 
patterns with low frequency should be treated most carefully. 
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Though the whole data should be proofread carefully, 
patterns with low frequency should be treated most carefully. 
For example, the rarely appearing patterns in the right columns 
in Table 1 might be either loanwords from foreign languages, 
or simply mistyped. 

4.2. ‘e’ and ’e2’ contrast 

In this paper we examine the phonemic conditions for ‘e’ and 
e2 only. The following method of data processing could also 
apply for other phonemes. Among BSD entries, first we 
extract the patterns with either ‘e’ or ‘e2’ is included. The 
number of the patterns is 1,258. These patterns appears in total 
12,175 times. 

Patterns which appear most frequently are as follows.: 

CaCCe (780), Ce2Ce2C (752), Ce2C (546), Ce2Ce2 (514), CaCe 
(401), CeCa (330), CeCCa (306), Ce2CCe2C (269), CeCaC (239) 
Co2Ce2 (218), CeC (205), CeCeC (193),... 

It should be noted that ‘e2’ such as in ‘Ce2C’ and ‘Ce2Ce2C’ 
appears more frequently than ‘CeC’ and ‘CeCeC’ does. 
Minimal pairs should be checked by contrasting the patterns 
such as ‘CeC’ and ‘Ce2C\ 

Among the above set of entries, we extract patterns with more 
than two e s or ‘e2’s. There are 248 such syllable patterns. 
These patterns appear in total 2,946 times. Among them, the 
following are the most frequent patterns. 

Ce2Ce2C (752), Ce2Ce2 (514), Ce2CCe2C (269), CeCeC (193). 
Ce2CCe2 (144), e2Ce2C (144), Ce23Ce23 (95), CeCe (78), e2Ce2 
(76), e2CCe2 (34), Ce23Ce23C (29), CaeCe (20), CeCCeC (20), 
eCeC (20), Ce2Ce2Ce2C (18), e2CCe2C (18), Ce2CCCe2C 

(17), Ce2CCe2CaC (15), CeCaCe (14), CeCCe (14), Ce2CCe (11), 
Ce2Ce2CCe2C (11), CeCea (11), Ce2CCCe2 (10),... 

Among the above subset, further extracting those with at 
least one 4 e2\ then, you get 2394 words. Again, extract from 
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the above those with ‘e’, then you get those patterns in which 
concurrence of ‘e’ and ‘e2’ are found. 

There are 74 patterns which include ‘e’ and ‘e2’ 
concurrence patterns. As was expected, the number of 
occurrence is very small; in total 149. 

Ce2CCe (11), Co2Ce2Ce (8), e2CCe2Ce (8), CaCe2Ce (7), 
Ce2CCe2Ce (6), Co2CCe2Ce (6), o2CCe2Ce (6), Ce2Ce (5), 
Ce2Ce2Ce (4), Ce2CeC (4). 

It should be first noticed that in case ‘e’ and ‘e2’ co-occur 
in a pattern, their order is always ‘e2’ and ‘e’, not vice versa. 

Considering the fact that in the disyllabic cases, the same 
vowel tends to be repeated in Santali, the above set of patterns 
are exceptional, therefore we should examine carefully. 
Though this examination has not been done yet, a few 
examples of these patterns are given below. 

Ce2CCe appears most frequently among the above result: 11 
times. 

By searching the original data, we find mente 9 times, hente 
once, jhedge once, respectively. Though these are listed as part 
of headwords in BSD, from a morphological point of view, 
they might further be separated as /men/ or /hen } + /te/, or 
/jhed } + /ge/. If so, these are then not exceptional disyllabic 
patterns, but a sequence of two separate monosyllabic 
morphemes. 

Co2Ce2Ce appears in the second most frequency; 8 times. 

These are: hotere, hotete, kclete, mot.here, notere, notete, 
nhotete, porete, each of which appears only once in the 
headword. 

Again, it seems plausible to separate final syllable te and re 
as monosyllabic morphemes. 
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5. Tentative Conclusion 

Now the digitalization of BSD is proceeding, though further 
pioofreading is necessaiy. The data can be used not only for 
finding English meaning of Santali headwords, but for 
reference to Santali traditional culture and society. 

In this paper, we use the data for phonemic analysis. Taking 
an example of the ‘e’ and ‘e2’ contrast, we analyzed the 
phonemic condition found in their syllabic patterns. 

Though moie examination is needed, it is now obvious that 
the ‘e’ and ‘e2’ contrast is not full-fledged vowel contrast. 

1. In general, ‘e2’ occurs more frequently than ‘e’. 

2. In disyllabic patterns, both e2 and < e’ tend to be repeated. 

3. In case ‘e2’ concurs with ‘e\ the order is ‘e2’ and ‘e’, but not 

vice versa. 

4. In case of the above concurrence, the patterns might be 

analyzed further into different morphemes. 
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EXPRESSIVES IN MUNDARI* 
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ABSTRACT 

Mundari has a rich system of expressives. This paper 
explores the rich system of expressive of this 
language. It also compares the views of provided by 
Gerard Diffloth, who was responsible for the term 
expressives’ by using it in relation to one of the 
Austroasiatic languages and since then adopted in 
other studies of Indian linguistic area. 


1. Introduction 

Mundari has a rich system of expressives. The term 
‘expressive’ was suggested by the Father of studies of 
Austroasiatic expressives, Gerard Diffloth (1976:263-264) on 
the occasion of the 1 st Austroasiatic Linguistics Conference 
organized by the University of Hawaii in 1973 and adopted by 
the Father of studies of Indian linguistic area, Murray 

PmAnoniT T 4 O O M. C \ 1 ±1. O _ _ _ _ A . . „ 

ijluv/ll ^ uu i) iii iiic ouuui /\sian contact m tne following: 

‘expressive’ is the most inclusive term for a form class 
with semantic symbolism and distinct morphosyntactic 
properties; ‘ideophones’ are a subclass in which the 
symbolism is phonological; ‘onomatopoetics’ are 
ideophones in which the reference of the symbolism is 
acoustic (i.e. imitative of sounds). Since the ideophones 
may have reference not only to sounds, but to any other 
objects of sense, including internal feelings as well as 
external perceptions (sight, taste, smell, etc.), and since the 
Indo-Aryan/ Dravidian items already examined have this 
very wide type of reference, the broadest term 
expressives seems appropriate. I have once written about 
the Mundari expressive in my grammar (Osada 1992:140- 
144). I, however, couldn’t touch expressives as the 
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syntactic and semantic property. Thus I will demonstrate 
here morphology, syntax, semantics of expressives and 
sound symbolism. 

2. Morphology of Expressives 

Expressive can be divided into the following types on the basis 
of the word formation: 

(A) Identical Reduplication 

(B) Partial Reduplication 

(C) Vowel Mutation 


(A) Identical Reduplication 

This expressive should be distinguished from verbal 
reduplication, which is clearly derived from the verbal base. It 
is a salient feature that a basic unit of reduplicational element 
has no meaning. Thus, 


Expressive form 

suyu'n suyu'n 
kase kase 
mondor mondor 
mogo mogo 
kata kata 


Meaning 

‘lean and small (person)’ 

‘to look askance at (a person)’ 
‘a smell of rice beer’ 

‘a smell of flowers’ 
‘haw-haw’ 


(B) Partial Reduplication 

Partial reduplication can be formed by two elements. The 
second element is a partial reduplication of the first element. 
We can categorise it in the following way: 

S*\ /1TTV T Mr 

tu LtA PVA 

Expressive form Meaning 

riti piti ‘very small leaves as those of tamarind’ 
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risuri pisuri 

‘the act of showing the teech again and 
again’ 

lata pata 

‘to make a stew thick, pasty’ 

latar patar 

‘a mixture of truth and lies wherein one 
does not know what to believe’ 

lede'n pede'n 

‘so fat that in walking he has 
difficulty’ 

(ii) CVX bVX 

Expressive form 

Meaning 

kau ban 

‘to do uncomfortably or uneasily’ 

kered bered 

‘a quarrelling and fighting 
disposition’ 

cere here 

‘chattering and twittering of numerous 
birds’ 

ladi badi 

'to put things in a disorderly manner, more 
or less one over another’ 

sador bador 

‘the act of letting bits fall whilst eating of 
strewing bits all around by pecking’ 

(iii) CVX mVX 

Expressive form 

Meaning 

celo'n melo n 

‘naughty boy’ 

ce'ngol me'ngol ‘shamelessness’ 

jiki miki 

‘rippling and glittering water’ 

seled meled 

‘mixture of different kinds of grain, etc.’ 


‘a shamed face or a crying face’ 


gero mero 
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(iv) CVX kVX 

Expressive form 

Meaning 

dale kale 

‘negligent (of taking care)’ 

hati kuti 

‘various kinds’ 

(v) CVX gVX 

Expressive form 

Meaning 

rain gain 

‘good or bad principles of conduct’ 

(EM) 

mane game 

‘want of punctuality in starting dilatoriness’ 
(EM) 

(vi) CVX cVX 

Expressive form 

Meaning 

repo cepo 

‘shrivelled’ 

dukur cukur 

‘uneasiness of mind’ 

(vii) CVX jVX 

Expressive form 

Meaning 

re' nge je'nge 

‘The condition of getting bothered or being 
subjected to trouble or annoyance’ 

haurnjauru 

‘desultory talk or conversation, passing 
from one subject to another without order 
or natural connexion’ 

runujunu 

‘to go or walk with difficulty due to a 
handicap’ 

(viii) CVX dVX 

Expressive form 

Meaning 

rawa dawa 

‘opportunity to do someting reprehensible, 
because there is nobody to interefere’ 
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(ix) CVX tVX 

Expressive form 

Meaning 

ribuy tibuy 

‘the act of fat people, walking with the 
buttocks rubbling against each other’ 

roka toka 

‘quickly’ 

(x) CVX sVX 

Expressive form 

Meaning 

boro soro 

‘cowardice’ 

(xi) CVX rVX 

Expressive form 

Meaning 

tiri riri 

‘the sound of a flute’ 

(i), (ii) and (iii) are 

very common. 

(C) Vowel Mutation 

(i) (C)aC[(C)a(C)] (C)uC[(C)u(C)] 

Expressive form 

Meaning 

dala dulu 

‘a fat and short person’ 

lada ludii 

‘a fat child’ 

tapai] tupu\] 

‘baby tries to walk’ 

tagam tugum 

‘the fat person who cannot walk swiftly’ 

lada ludu 

‘a fat baby’ 

(ii) CaC[a(C)(a)] < 

CoC[o(C)(o)] 

Expressive form 

Meaning 


‘to eat away with a savage appetite’ 


sar sor 
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karay koroy 

‘a gurgling breathing of one being 
strangled ’ 

rakara roko 

ro ‘the rattling of something in a box or in a 
bottle or the like’ 

da n do n 

‘a deep and big hole’ 

pagad pogod 

‘a swollen state of the whole body’ 

(iii) CaC[aC] CiC[iC] 

Expressive form 

Meaning 

palad pilid 

‘the act of shining in various places’ 

par pir 

‘the act of dispersing’ 

(iv) CaC[(C)aC] CeC[(C)eC] 

Expressive form 

Meaning 

pa'ngad pe'nged 

‘a glitter of light appearing and 
disappearing now here, then there’ 

ca\] ce ij 

‘used for the cry of babies’ (EM) 

(V) CiCa(C) CoCo(C) 

Expressive form 

Meaning 

kidar kodor 

‘a cock with a long upright comb and long 
wavy feathers on the neck and tail’ (EM) 

kira'n ko ro'n 

‘a tall and lean person’ 

kica koco 

‘a tall cock’(EM) 

gida godo 

‘semi-liquid things’ 

pica poco 

‘to empty a soft or pasty substance by 
compression’ 
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(vi) CiC CoC 

Expressive form Meaning 

bir bor ‘tall and straight’ 

kr l° r 'a long and weak sapling’ 

3. Syntax of Expressive 

Syntax ot expressive has never been described. Expressive can 
occupy in any place, i.e., a slot of predicate, complement and 
argument. As the head of predicate, expressive can take 
derivational suffixes; e.g., passive, reflexive, benefactive, and 
aspect markers. Expressive can also form the serial verb 
construction. Thus, 

(1) businfre seta-hon-e utul-putul-ta-n-a. 
straw-LOC dog-child=3SG:SUB EXPR-PROG-INTR-IND 

The puppy is playing in the straw then the straw is 
shaking.’ 

(2) nir-nir-te-’f _ angor-sangor-gi ri-akci-n-a. 

run-run-ABL=3SG:SUB EXPR-throw away-CONT-INTR-IND 

‘As s/he is running and running then s/he is totally getting 
out of breath.’ 

Some expressives require the experiencer object like in the 
experiential constructions. For instance, 

(3) rua-te alae-balae-ki-'iyil-a. 

fever-ABL EXPR-COMPL-TR-1SG: OB J-IND 
‘I got a trouble by a fever.’ 

An expressive alone or an expressive with the progressive 
aspect marker ta and the intransitive maiker -n can occupy in 
the complement slot as an adverbial phrase in the following: 

(4) kata-kata=e landa-ta-n-a. 

EXPR~3SG:SUB smile-PROG-INTR-IND 
‘S/he is laughing uproariously.’ 
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(5) iri\)-iri\)-ta-n=(e)-m landa-ta-n-a. 
EXPR-PR0G-INTR=EPEN~2SG: SUB smile-PROG-INTR IND 

‘You are smiling like a mock at anybody.’ 

An expressive can occupy in the argument slot to modify a 
noun or noun phrase. For example, 

(6) ini\]~do janao ako\]-bako'\) _ ho ro-ge. 

that person-TOP always EXPR person-EMPH 
‘S/he is always a stupid person.’ 

An expressive can occupy in the head of noun p 1 "ase in th 
following instance: 

(7) ini\]-a\)_ _ isi ri-siki ri ka=h suku-a. 

that person-GEN EXPR NEG=1SG like-IND 
‘I don’t like her coquettish laughing.’ 

As is seen above, we have the identical redupulicate form. 
Although the single form has usually no meaning, some single 
forms which are followed by the completive aspect marker ke 
and intransitive marker n occupies in the complement slot as 
adverbial phrase: 

(8) til— ij_ cad ta-cad, ta-ke-d-a. 

hand=3SG:SUB clap:EXP-COMP-TR-IND 

‘S/he clapped her/his hand.’ 

(9) cad ta-ke-n=e\ j_ tab 

clap-COMPL-INTR=3SG:SUB slap-ANT-TR-3SG:OBJ-IND 

‘S/he slapped him/her like clapping.’ 

4. Semantics of Expressive 

Nobody has ever describe the semantics of expressive. 
Hoffmann has just described the several expressive forms as 
variants in Encyclopaedia Mundarica, which consist of 16 
volumes. For example, the following thirteen forms are the 
sole one entry for ‘a smile; to smile etc.’: 
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mogoe ij, mogoeij-mogoeij, mergoeij, mergoeij-mergoeij, merlo n, 
merlo'n-merlo'n, mirlu n, mirlu' n-mirlu' n, moeij-moeij, mugitiij, 
muguiij-mugiuij, musuiij, musuiijmusuiij. 

According to my informants, some forms mogoeij-mogoeij, 
mirlu n-mirlu n, moeij-moeij are not known by them because 
of dialectal difference. They, however, can differentiate a 
meaning in the following: 


mergoeij-mergoeij ‘smiling in mouth’ 

merlo'n-merlo'n ‘smiling by children or aged-persons who 

have no teeth’ 


miiguiij-mugiiiij ‘smiling cheekful’ 

musuiij-musuiij ‘smailing in eyes shyly’ 


Apart from these, there are a lot of expressive to express the 
action of laughing and etc. Thus I demonstrate the semantic 
field of laughing, smiling and chuckling below 1 . 


hada-hada 

kata-kata 


kaij-kaij 

keij-keij 

keteij-keteij 

ko~e-kd~e 


‘to roar with laugh successively’ 

‘to roar with laugh (less than hada-liada) 
by many peoples’ 

‘to laugh like a hen’s clucking’ 

‘to laugh like a jackal’s howling’ 

‘to laugh innocently (by children)’ 

‘to roar with laugh without sound’ 


kere-kete 


‘to laugh in talking’ 


isin-isin 


‘to ridicule one’s action or talk’ 


isiri-sikiri ‘to laugh coquettishly’ 

iriij-iriij _ ‘to laugh like a mock at’ 


I give another example of expressive for light reflection in the 
following: 
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jaka jaka 
jaka inakci 

jiki miki 
caka maka 
jili mill 
jilib jilib 
bijir bijir 
jilab jolob 
jolob jolob 
jar an jar an 
pirid pirid 
pilid pilid 


‘shining with gold’ 

‘shinning with a flashy dress (sari with 
gold)’ 

‘shinning with leather’ 

‘shinning with steal or silver’ 

‘shinning with building’ 

‘dazzle with electric light’ 

‘lighting’ 

‘glimmering with a firefly’ 

‘glimmering with many fireflies’ 

‘glittering in the sun’ 

‘glimmering on the sand’ 

‘twinkling with stars’ 


5 . Sound Symbolism 


As far as the sound symbolism is concerned, ‘it is often said 
that if vowel quality is used for size symbolism, [i] will 
symbolize smallness, and the lower vowels, especially [a], will 
symbolize largeness, with degrees in between’ (Diffloth 
1994:107). Diffloth, however, has suggested the counter¬ 
example (i: big, a: small) from Bahnar which is also belonging 
to Austroasiatic language family. 

In Mundari, it seems to me that i symbolizes smallness 
while a symbolizes largeness, as in the following: 


sata-sata 


siti-siti 


jaram-jaram 


jirim-jirim 


‘a passing rain for a long time’ 

‘a passing rain’ 

‘a heavy rain (the water in the river 
fulls)’ ’ 

‘a heavy rain (the water in the rice-field 
fulls)’ 
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kcica-kaca 

kici-kici 


‘to scold somebody with action’ 
to scold somebody only by mouth’ 


The following cases should be taken into consideration: 

‘to act lazily’ 

‘to act especially walk lazily (more lazy 
than baya-baya ’) 

'to drizzle (not to get wet without 
umbrella)’ 

‘to drizzle (but to get wet)’ 

Thus we considei the sound symbolism of vowel quality in 
Mundari tentatively as u> a> i. 


baya-baya 

buyu-buyu 

pisir-pisir 

pusur-pusiir 


6. Conclusion 

Mundari has a rich expressive system as is shown above. We 
need further work to describe this system in an exhaustive 
mannei. I am compiling a dictionary of expressives in 

Mundaii. Since Diffloth gave a paper on the expressives in 

Semai at the First International Conference on Austroasiatic 
Linguistics in 1973 in Hawaii, eXDressive is nnp nf thp main 

- I — —— --- ^ 1 liUIll 

topics for Austroasiatic linguistics; i.e., Diffloth 1976a for Jah 
Hut; Diffloth 1976b and Hendricks 2001 for Semai; Benjamin 
1976:177-178 for Temiar; Svantesson 1983:115-125 for 
Kammu; Kruspe 2004 for Semelai; Burenhult 2002:162-164 
for Jahai; Diffloth 2002 for Surin Khmer; Migliazza 2005 for 
So (Bru); DiCanio 2005 for Mon and Khmer and etc. Thus we 
have to do a comparative Austroasiatic expressive system in 
near future. Apart from Austroasiatic expressives, it is one of 
the areal features from South Asia to East Asia. We should 
study these expressives typo logically in future. 

NOTES 

' This study is supported by Grand-in-Aid for Scientific Research (C) 
No. 18520345 Ministry of Education, Culture, Sports, Science and 



Expressive in Mundari 


281 


Technology. This paper is overlapping of my forthcoming paper 
‘Mundari’ in Gregory D.S. Anderson (ed.) Munda languages, 
Routledge, 2008. I would like to expressive my sincere thanks to 
Gregory Anderson and Nicholas Evans of Melbourne University 
for reading and commenting the earlier version of this paper. 

I don’t repeat the above-mentioned expressives here. 
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ABSTRACT 

The topic of this contribution are two types of multi¬ 
verb constructions in Gorum, a South Munda 
language spoken in Orissa/India. The constructions 
of both types consists of two finite verbs which are 
closely coordinated. While one of these construction 
types is composed of two main verbs and has some 
characteristics of serial verb constructions rather rare 
among South Asian languages (and with them among 
Munda languages), the other construction type 
contains a main verb and a kind of auxiliary 
resembling explicator compound verbs very common 
in South Asia. 

I will discuss the morphological and syntactical 
properties of these constructions and especially focus 
on the similarities and differences of the connection 
between the verbs in these constructions. I will 
furthermore discuss their status in the grammar of 
Gorum as well as their characteristics in respect to 
genetically and geographically closely related 
languages. 


Introduction 

In this paper I will discuss the grammar of verbs in Gorum, a 
South Munda language of Orissa and Andhra Pradesh (India). 

HTU ~ r■< 11 fUo in+avoo+mn mrk-mV» r\ m na 1 Qnrl 

1 lie 1ULU^ Will VJ 11 LllV^ 1111A-/1 w JL llioipaviu^ivui 

syntactic aspects of the main predicate position of a sentence. 
For this purpose I will present the morphological structure 
required by the syntactic position that accommodates the main 
predicate and is displayed in simple mono-verbal predicates. 
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Subsequently I will use evidence from multi-verbal complex 
predicates to show that the morphosyntactic analysis derived 
from the simple predicates is oversimplified and that an 
additional category is necessary to account for the 
morphosyntax of multi-verbal predicates in Gorum. This 
categoiy constituted by all verbs of a predicate of one sentence 
will be called verbal complex. I will show that this notion can 
be very fruitfully employed to capture generalizations over 
different phenomena in the morphosyntax of Gorum. 

1. Simple Predicates 

In this section I will present the morphology and syntax of 
simple piedicates. In Gorum, all predicates that consist of one 
verb (and its non verbal complements) regardless their 
morphological complexity should be regarded as a simple 
predicate. Although mono-verbal predicates can display a 
considerable complexity in Gorum the main division is 
between multi- and mono-verbal predicates. In the following, 
relevant morphological categories are presented and 
subsequently some basic facts of Gorum syntax are introduced. 

1.1. Morphology 

The morphology connected with the syntactic category verb is 
arguably the most complex part of the morphology of Gorum. 

I here are still phenomena that are not yet well understood and 
certainly some have not been noticed at all. In the following, I 
will present what is known about the verb in Gorum, 
unfortunately not as detailed as it deserves. For further aspects 
please confer Aze (1973), Zide (1972, 1990) and Anderson & 
Rau (forthc.). 

1.1.1. Stem 

In conti asts to nominal stems which have to be bimoraic in 
Gorum (reconstructed by Anderson & Zide (2001) as a general 
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feature of the language family), verbal stems do not seem to be 
subject to minimal word constraint. But although verbal stems 
can be rather minimal in Gorum, they can also be remarkable 
complex. Stems minimally consist of a root. But 
morphophonological evidence suggests that the root plus three 
additional formatives constitute the category stem that will be 
discussed here. This unit is the domain of various phonological 
rules or constraints. From a morphological point of view, one 
can distinguish between lexical parts of the stem and 
grammatical formatives. The lexical parts constitute the inner 
part of the stem and consist of the root and, for reciprocals and 
other verbal plurality forms, a reduplicated root (RDL). Both 
are lexical units and not grammatical elements. The 
reduplicated material is in most cases identical with the root, 
as shown in (1). But due to a phonological constraint that 
restricts the number of glottal elements (glottal stop, creaky 
voice, preglottalized consonants) in the stem to maximally one, 
the reduplicated material may not contain any glottal elements 
notwithstanding the presence of these elements in the root. An 
example for this constraint is the reduplication in (2). 

(1) zum ‘eat’ —► zumzum 

(2) ga'd ‘cut’ —> gaga'd 1 


Additionally the stem may not be altered in its 
phonological or prosodic structure. This results from an 
alignment of the syllable structure with the borders of the 
stem. This principle is maintained even if it results in a non 
optimal syllable structure, as is the case in (3), where /zu.mu/ 
would be better from the point of view of Gorum syllable 
structure as it conforms to a maximal onset principle and thus 
avoids the non optimal syllable /u/. But the actual realization is 
/zum.u/ in which the stem /zum/ does not undergo any re- 


Woiced stops with a prefixed apostrophe represent the glottalized phonemes. Thus V/ 
represents A d/ which is occurs either unreieased or with a nasal release. For a short 
description of Gorum phonology see Anderson & Rau (forthc.). 
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syllabification and the right edge of the stem remains aligned 
with the right edge of the syllable. 

(3) ziim-u 
eat-INF 


The same alignment of prosodic and morphological 
structure applies to the causative prefix (CAUS) ab-. The 
syllable structure of the causative prefix remains unchanged 
even when a preceding formative causes non-optimal 
phonotactics, such as the hiatus in sentence (4). This behaviour 
contrasts with the morphophonology of the modality prefixes 
that lose their initial vowel to avoid a hiatus. Modality affixes 
occupy the slot preceding the causative prefix (example 5) and 
are discussed under 1.1.2 below. 

(4) ne-ab-so’e-t-om 

1 sS-CAUS-learn-F-2sO 
‘I will teach you.’ 

(5) or-ab-so’e-ir) 

NEG/F-CAUS-learn- IsO 
‘He/she will not teach me.’ 


The loan verb suffix is obligatorily attached to borrowed 

roots whirh arp mnd-li; aI_t _i. a 

-, ■ • —— ~ nuuj jL/cMa, me muo-/\ryan 

lingua franca of the region where Gorum is spoken. The 
evidence that this suffix is part of the stem is twofold. First the 
loan verb suffix can alter the prosodic structure of the 
borrowed root. Thus the combination of the Desia root bil ‘to 


dissolve’ and -ej results in the phonologically better /bi.lej/ and 
not in the prosodic structure preserving /bil.ej/ in which the 
prosodic and morphological edges would be aligned. The other 
argument to consider the loan verb suffix part of the stem 
comes from the affectedness marker (Zide, 1990). This 
morpheme is realized by a glottal stop or a creaky voice that 
occurs in the rhyme of the syllable following the root or the 
loan verb suffix as in sentence (6). If the loan verb suffix is 
considered part of the stem this rule can be simplified to the 
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statement that the affectedness marker occurs in the rhyme of 
the syllable following the stem. 

(6) bil-ej-tuU 

dissolve-LV-F/AFF 
‘It will dissolve.’ 

The resulting templatic structure of the verbal stem is 
represented in table 1. 


CAUS- RDL- 

ROOT | -LV 




Table 1: The morphological structure of a verbal stem 
1.1.2. Modality 

As mentioned above, there is evidence for the analysis that the 
modality and negation markers aj- and ar-/or~, unlike the 
causative prefix ab-, are not to be considered part of the stem. 
The main rationale behind this is the fact that they are not in 
the scope of the prosodic stability rule that applies to the parts 
of the stem. Thus in example (7) the hiatus resulting from the 
morphotactic combination of the first person singular subject 
prefix ne- and the causative marker ab- is retained while in the 
phonotactically parallel examples (8) and (9) the hiatus is 
avoided by the deletion of the initial vowel /a/ of the past tense 
negation prefix ar- and the irrealis prefix aj- respectively. 

(7) ne- + ab- + zum + tu —» neabzumtu 

‘I will feed him/her/it’ 

(8) ne- + ar- + zum —» nerzum 

‘I haven't eaten it (him/her)’ 

ne- + aj- + zum —» nejzum 

‘I would have eaten it (him/her)’ 


( 9 ) 
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1.1.3. Tense 

Tense is maiked in Goium primarily by two morphemes, 
the future tense suffix -tu and the past tense suffix -m. Both 
aie directly suffixed to the stem. The future marker undergoes 
no morphophonological changes, while the past tense suffix 
assimilates to the stem coda in case it ends with a liquid or a 
nasal. Thus l+-ru results in l-lu and m+-ru will appear as m- 
mu. 

1.1.4. Argument Marking 

Argument marking on verbs is a very interesting part of 
Gorum morphosyntax and only a small portion of its 
complexity can be covered here. Gorum has eight affixes to 
indicate speech act participants on a verb and two forms to 
mark third person plural subjects. A complete list of these 
formatives is given in table 2. The argument marking system 
for speech act participants is comparatively straight forward. 
There are two different classes of affixes. The first class are 
subject marking prefixes that occur in the slot preceding the 
modality prefix. The other group of formatives are object 
marking suffixes following the tense suffixes. Encoding person 
and number, these affixes can occur simultaneously at one 

verb and form a minimal but complete transitive sentence like 
in example (10). 

(10) le-laU-t-om 
lpS-hit-F-2sO 
k We will hit you.’ 
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SUBJECT 

OBJECT 

lPSG 

ne- 

-ir) 

IP PL 

le- 

-ileri 

2P SG 

mo- 

-om 

2p PL 

bo- 

-ber) 

3PSG 

0 

0 

3ppl 

-ej/-gi 

0 


Table 2: Argument markers 

Third person marking follows rather different 
morphosyntactic strategies. First of all the only possible third 
person category that can be marked on the verb is the third 
person plural subject, as can be seen from table 2 above. None 
of the three other categories is reflected in verbal morphology. 
Although it represents a subject category, the third person 
plural subject marker -ej is a suffix (example 11) and occupies 
the same moiphotactic position as the suffixes marking speech 

a r*t nurfioinont nKiAof c Th i c /^tcfriKnfion oor» rocnlf in a rsrvf'l ir»f 

L |^U1 W UJUU lO . lino UlOll IL/UUOll V-CXll 1 V/OU1L XI1 & I1V/L 

in case that a third person plural subject occurs together with a 
speech act participant object. This conflict is resolved by an 
alternative form of third person subject marking as can be seen 
in example (12) in which the default 3pS-marker -ej is blocked 
through the first person object marker -it] and replaced by the 
marker -gi following the object suffix. 


(11) laU-t-ej 

1111 X 

‘They will hit him/her/it.’ 

( 12 ) laU-t-io-gi 

hit-F-lsO=3pS 
‘They will hit me.’ 
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1.1.5. Post Object Suffixes 

The formatives following the object suffix are not numerous 
and cannot be easily subsumed under a common superordinate 
label. The aforementioned alternative third person plural 
subject marker -gi belongs to this group as does the cislocative 
suffix -aj that occurs directly behind the object suffixes, as in 
example (14), and indicates that an event has an orientation to 
the discursive deictic centre. Example (13) demonstrates that 
the cislocative suffix is morphotactically located in a slot 
between the object suffixes and the 3pS-marker -gi. 

(13) baU-t-aj-gi 
come-F-CISL-3pS 
‘They will come.’ 

(14) guro'e aoU-r-ip-aj 
shame be-P-lsO-ClSL 
‘I felt ashamed.’ 

There are two other formatives: the imperative plural 
maikei -bu, that occurs in the first and second person 
imperative plural and that morphotactically follows the 
cislocative marker (example 15). The last one is the 
progressive marker -m mat, as in (16), has its place in the final 
morphological slot following the third person plural marker - 
gi- 

(15) le-laU-aj-bu 
IpS-hit-CISL-IMP/PL 
‘Let us hit.’ 

(16) bae-gi-ni 
come/P-3pS-PROG 
‘They are coming.’ 
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1.2. Syntax 

As mentioned above, the syntax of Gorum, as in fact that of 
Munda languages in general, is not yet well understood. 
Therefore I will describe some aspects of its syntax that are 
known and relevant to this paper. The main structure of a 
sentence can be assumed to be as in example sentence (17) and 
can be schematically represented as S O v. Although the 
predominant linear order is SOV, the word order as such is 
pragmatically determined, as can be seen from example (18). 
There is a fixed linear order on elements that constitute what 
one would like to call an NP but the order of verb argument 
NP as well as adjuncts seem to be pragmatically determined. 
From what is known so far, there is no syntactic evidence that 
the representation of the Gorum sentence requires a VP 
category. 2 


(17) bilerj bag kuso'd le-giU-ru 
lpS_PRON two dog IpS-see-P 
‘We saw two dogs.’ 

(18) miumun bilep le-dor-r-aj borjtel. 
last_year lpS_PRON IpS-take-P-ClSL buffalo 
‘We brought a buffalo last year.’ 

The only other element whose position is syntactically 
determined seems to be the honorific particle a om that always 
follows the verb directly. A structure as in (19) is thus the only 
purely syntactically motivated restriction relevant o the 
syntactic category V. 

(19) baU-t-aj-gi aom 

HOW 

vvjuiv x w'p'u Jtivyi'i 

‘They will come.’ 


There is however some asymmetry between subject and object affixes, but they are not 
reflected in the syntax of the respective NP. A discussion of these aspects of Gorum syntax 
would go beyond the scope of this paper. 
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But it must be emphasized that more research about the 
syntax of Gorum is needed, especially in respect to 
coordinated structures. And thus, it might turn out that a VP is 
a necessary category for explaining some syntactic phenomena 
in Gorum, but it is certainly not a prominent category in its 
syntax. The facts known so far do not support the assumption 
that an entity between the syntactic category V and the clause 
is needed. 

13. Summary 

From the discussion above it can be assumed that Gorum has a 
flat sentence structure and that the morphosyntactic structure 
relevant to the syntactic category verb is similar to the 
structure represented in table 3. A verb in Gorum consists of a 
stem, which in turn consists maximally of a causative prefix, a 
simple or reduplicated root and in case of being a loan verb the 
loan verb suffix. A stem combines with a series of prefixes and 
suffixes and this unit constitutes a verb. The verb is optionally 
followed by the honorific particle. 


MOD- CAUS- RDL- ROO -LV -TENSE -OBJ -C1SL -PL 


NEG- CAUS- RDL V" -LV -F K-n r , ici Qno 

r -ISU -CISL -3PS -PROG HON 

' RR ‘ -P -2sO -IMP/PL 

-INF/TR -ipO 

-1NF/INTR -2pO 

_______ -3pS 

Table 3. The morphological structure of verbs (preliminary) 

2. Complex Predicates 

The morphological structure presented in the preceding section 
and represented in table 3 adequately characterizes the 
situation with simple predicates. But as it turns out, it is 
oversimplified for complex predicates consisting of more than 


SUBJ- 

lsS- 

2sS- 

IpS- 

2 P S- 
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one verb. In Gorum there is a group of constructions which 
have predicates consisting of two and optionally more verbs. 
These structures also provide insights into the structure of all 
predicates in Gorum. 

Three types of different multi-verb predicates will be 
considered: the nuclear verbal chains, the explicator compound 
verb constructions and the stative construction. All three 
constructions consist primarily of two verbal positions, Vi and 
V 2 , which require each of the respective verbs to be finite, 
inasmuch as they carry argument, tense, modality, cislocative, 
or any other marker required by morphosyntax. There are 
nevertheless substantial differences between these 
constructions that cannot be discussed in detail here. For a 
discussion of the different multi-verb constructions please 
confer Rau (to appear). 

2.1. Nuclear Verbal Chains 

Nuclear verbal chains are series of independent verbs that 
together denote one event as any ‘depart’ and ue ‘go’ in 
sentence (20). The two verbs indicate different aspects of the 
event such as manner, direction of motion in respect to the 
centre of discourse or some salient aspect of the surrounding. 
In this type of construction VI and V2 have the same argument 
structure, carry the same tense, modality and other 
grammatical features and display generally identical affixes. 
There is no clear grammatical restriction which verb can be 
part of a nuclear verbal chain. 

(20) enurpnu lok uaubun aurj-rj-ej 

Enung-ATTR people yesterday 

ue-j-ej 

depart-P-3pSgo-P-3pS/AFF 
‘The people from Enung went away yesterday.’ 
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2.2. Explicator Compound Verbs 

Explicator compound verb constructions (ECV) consist of two 
verbs, but in contrast to verbal chains, the set of verbs that can 
occur in the V2 position is restricted to laU ‘hit’, ue ‘go’, and 
ta’ e ‘give’. While the argument structure of the construction is 
identical with the default argument structure of VI, both verbs 
cany the same markers for arguments, tense, modality and 
other morphosyntactic features. Sentence (21) shows a typical 
example of the ECV construction with IaU ‘hit’. 

(21) sobu juU-r-ej laU-r-ej 

all pick_up-P-3pS hit-P-3pS 
‘They picked up everything.’ 

The semantic contribution of the different V2 is sometimes 
subtle and not well understood. The ‘hit’ ECV construction 
conveys that the action expressed by VI is performed 
vigoiously oi exhaustively and the ‘go’ ECV construction 
brings out a negative estimation of the event, while the ‘give’ 
ECV construction conveys the estimation that the action was 
performed voluntarily. 

2.3. Stative Construction 

The stative construction combines a variable V, with duku ‘be’ 
in the V2 position. Like in the other multi-verbal constructions 
discussed both verbs carry morphological markers in this 
construction as is demonstrated in example (22). But the tense 
displayed on VI is deducible from the tense value of the V2 
duku be . If the V2 is past or neutral tense, V2 must carry a 
past marker and if V2 is in future tense VI will also show 
future marking. Thus at least in some respects V2 governs VI. 

(22) bilep aauU-nu lok uaubun sur ue-j-ej 
lpS_PRON village-ATTR people yesterday 
auku-r-ej 

hunt go-P-3pS/AFF be-P-3pS 

The people of our village have gone hunting yesterday.’ 
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3. The Verbal Complex 

There are some phenomena common to all three types of 
complex predicates introduced above that cannot be explained 
with our current analysis of the verbal position in Gorum 
syntax. Thus the markers of the PL slot, the alternative third 
person subject marker -gi, the imperative plural marker -bu, 
and the progressive marker -ni that occurs in the subsequent 
slot are only allowed to appear after V 2 and not after V!. 
Although they only occur attached to V 2 their scope extends 
over Vj and V 2 . Thus in example (23) both verbs possess a 
third person plural subject and in (24) both verbs are in 
imperative mood and have plural value. Sentence (25) is even 
more instructive as the past tense of V s is only accountable 
through the requirements of the progressive. 

(23) cjup-rpaj ba'j-gi 
depart-P-ClSL come/P-3pS 
‘They came over.’ 

(24) le-cjur) le-b 
IpS-depart IpS/go-iMP/PL 
‘They have come.’ 

(25) le-se’b-mu le-ta’j-ju-ni 
IpS-chop-P IpS-give-P-PROG 
‘We are cutting it down.’ 

The difference between these three and the other formatives 
manifests itself most clearly in the comparison between 
sentence (23) above and (20) here repeated as (26). Both are 
nuclear verb chains and are semantically very similar. Both 
have a third person plural subject. But in (23) the occurrence 
of the default 3pS-affix -ej is inhibited by the presence of the 
cislocative affix on V] and lexical restrictions on the verb ba’j 
‘come’, which cannot take -ej in any case, and the third person 
plural subject marking is performed by -gi following V 2 alone. 
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(26) enup-nu lok ucjubun <4.urj-r)-ej uj-j-ej 

Enung-ATTR people yesterday depart-P-3pS go-P-3pS 
The people from Enung went away yesterday. ’ 

Since these three markers are not allowed to occur between 
wo veibs of a multi-verb construction, but obviously have 
scope over both verbs, we have to assume that they attach to 
the compJe. of V, and V 2 and thus constitute a category 
distinct from the other formatives that have been called affixes 
e. In the following these three formatives will be called 
clitics and will be separated by "=" instead of Table 4 
below represents the observed structures schematically. 

*V-gi V-gi | [v V ]=gi 


*V-bu V-bu 
*V-ni V-ni 


[V V]=bu 
[V V]=ni 


Table 4: The distribution of the clitics =gi, =bu, and 

As a starting point for further analyses of the grammatical 
properties of the V,/V 2 complex the following example (27) is 
of interest. The sentence in (27) is an instance of the stative 
construction in which the now familiar verb ba’, ‘come’ 
occurs as the V,. This example allows us to get a better 
uiiuersianamg ot the interaction of morphology and syntax in 
these multi-verb constructions. The sentence exhibits an 
infrequent constellation in which only one third person plural 
subject marlcer is present, but contrary to our expectations it is 
the ipS-affix ej that only occurs on V 2 and not on V,. 

(27) ba'j c[uku-r-ej 

come/P be-P-3pS/AFF 
‘They have come.’ 


To understand this structure we must recognize that there 
are probably competing generalizations. The first principle at 
work here is the lexical rule that the verb ba’j ‘come’ may not 
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combine with the 3pS-affix -ej? Thus the word *ba ’j-ey is 
ungrammatical and we normally find ba’j-gi. But this form is 
deprecated on the basis that the resulting structure [V,=gi V 2 ] 
is syntactically ill-formed. The syntactically well formed 
structure [V! V 2 ]=gi is blocked by the morphological fact that 
*duku-r-ej=gi, with double third person plural subject 
marking, is ungrammatical. But removing the 3ps-affix -ej 
would not result in a grammatical form either, as * duku-ru-gi 
is also ill-formed, apparently because the occurrence of =gi 
instead of -ej cannot be morphologically motivated. This line 
of reasoning is supported by the fact that in case the 
occurrence of =gi is motivated, for example through the 
existence of a cislocative affix on V 2 as in example (28) the 
structure [V* V 2 ]=gi actually appears. 

(28) ba'j c(uku-r-aj-gi 

come/P be-P-CISL/AFF 
‘They have come.’ 

So far we have seen that the multi-verb construction does 
not allow the occurrence of clitics inside the [V x V 2 ] complex, 
but that on the other hand the attachment of these clitics to V 2 
obeys the same purely morphological rules as with single 
verbs. 

Additional evidence from sentences like (29) suggest that 
noun phrases can not intervene between verbs in such a multi¬ 
verbal complex. In fact the local adjunct asuij ‘house’ could be 
placed between ‘laugh’ and ‘depart’ but would create two 
distinct complexes and render the meaning from a single event 
of simultaneous laughing and moving to a sequence of a 
laughing event followed by a different event of motion. 


3 This lexical rule is presumably connected to the morphological rule that the 3pS-affix -ej 
is also not allowed to occur with the cislocative affix -aj. Considering the phonological and 
morphological similarity between the verb for 'come' and the cislocative marker it seems 
likely that the verb is either the historical source of the affix or a fusion of the affix with 
other material. 
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(29) asur) lic(a-ru licja-ru dur)-r)u uj-j 
house laugh-P laugh-P depart-P go-P 
‘He laughed and laughed and went to the house.’ 4 

The sentence in (29) demonstrates that the primary word order 
of s O v or NP np v as established in section 1.2 above is still 
basically in accordance with the facts if v is substituted by the 
verbal complex (VC) established here. As a result, the basic 
syntactic structure of a sentence in Gorum has to be analyzed 
as NP S np 0 vc. 

Another aspect that I will mention here, but cannot further 
discuss, is the internal structure of the verbal complex. I have 
assumed so far that the complex consists of a simple sequence 
of verbs and thus is a symmetrical or coordinated structure, but 
as mentioned in section 2.3 the stative construction, despite the 
fact that both verbs are equally finite, displays a dependency of 
V, on V 2 in respect to tense. This rises the question how a 
sentence like (30) in which a verbal chain of two verbs takes 
the position of a V, in a stative construction relates to an 
unambiguously coordinated structure as in example (29) 
above. 

(30) od otur cjurppu uj-ju cjuku? 
yonder from depart-P go-P be/AFF 
‘He has left from there.’ 

To sum up our findings in this section, there is strong evidence 
that there is a category larger than the verb in the grammatical 
structure of Gorum that in difference to a VP only consists of 
one oi more verbs and potentially of clitics and a particle. I 
have called this category verbal complex and its internal 
structure analyzed so far is represented in (31); 

(31) [VERB ... VERB ] =PL=PROG HON 


4 Az e& Aze (1973, p. 226) 
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On a more detailed level the morphological structure has to be 
analyzed as represented in table 5. This table only shows the 
structure of the final part of the verbal complex. The material 
labelled verb that comprises the structure from the subject 
prefix slot SUBJ to the cislocative slot CISL can be iterated to 
the left. 

Verb Clitics 


SUBJ- 

MOD- 

CAUS RDL- ROOT 

-LV -TENSE 

-OBJ 

-CISL 

=PL =PROG 

HON 

lsS- 

NEG- 

CAUS RDL- 

-LV -F 

-IsO 

-CISL 

=3pS =PROG 

HON 

2sS 

IRR- 


-P 

-2sO 


=1 MP/Pi. 


lpS- 



-INF/TR 

-ipO 


1 


2pS- 



-INF/INTR 

-2pO 



j 





-3pS 



1 


ToKIq s • nrino <~*f flot* •worVxpl pnmnlpv 

lflUlW J. 1 11V OL1UCLU1 V U1 Li T V/l UUi VUlllL/lVA 


4. Conclusion 


The category verbal complex is an essential notion to 
understand the morphology and syntax of the sentence in 
Gorum. Only with this structure the distribution of verbal 
formatives can be understood as it explains the behaviour of 
the formatives I have called clitics. Their morphosyntax is not 
entirely accountable on the level of verbs. The verbal complex 
can also help to understand certain aspects of the linear order 
in Gorum sentences and thus allows to explain the complex 
interaction of morphological and syntactic principles in this 
language. 
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Far from having delivered an exhaustive analysis of the 
vei b in the morphosyntax of Gorum I hope to have shown that 
it is an interesting domain. Further research is needed to 
understand the morphosyntactic structure involved. The notion 
of a veibal complex may be helpful in this endeavour. 
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FEATURES OF WORD PHONOLOGY OF THE 
AUSTRO-ASIATIC LANGUAGES OF INDIA 

Pramod Pandey 
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Jawaharlal Nehru University, New Delhi 

ABSTRACT 

In this paper, I propose to present an overview of the 
features of the word phonology of Austro-Asiatic 
languages of India in relation to the following aspects: 
syllable structure, vowel and consonant phoneme 
inventories, and constraints on their patterns. Where 
relevant, I also point out comparative features of other 
language groups of India. The study reported in this 
paper is based on a broader study of the phonological 
patterns of Indian languages. 


1. Introduction 

The main objective of the present paper is to give a general 
account of the features of the word-phonology of Austro- 
Asiatic languages of India, based on their available 
descriptions. The Austro-Asiatic languages taken into account 
include: Bhumij (Ramaswami 1992), Bonda (Ramchandrarao 
1986), Didayi (Ghosh 1996, Ashirvadam 1992), Ho 
(Bhattacharya 1980), Juang (Dasgupta 1976), Kharia (Biligiri 
1965), Khasi (Henderson 1976, Nagaraja 1985, Rabel 1961), 
Korku (Nagaraja 1999, Zide 1960), Mundari (Cook 1965, 
Osada 1992, Pandey 1989, Sinha 1975), and Santali (De 1996, 
Macphail 1957, Suresh 1986). 

The present outline of the word-phonology of these 
languages deals with the following aspects: consonant and 
vowel phoneme inventories (§ 1), consonant and vowel 
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allophones (§ 2), constraints on the occurrence of consonants 
and vowels in words (§ 3), and syllable structures (§ 4). 

2. Consonant and Vowel Phoneme Inventories 

2.1. Consonants 

Given below is what may be called the canonical inventory of 
the consonant phonemes of the Austro-Asiatic languages of 
India. The inventory includes two types of consonant 
segments- core consonants and consonants that are found in 
one or more but not all the languages. The latter are presented 
in italics. 


Articulation Place 

Manner j. 

Bilabial 

labio¬ 

dental 

Dental 

Alveolar 

{etroflex 

Palatal 

Velar 

filottal 

Plosive 

P b 

p h b“ 


t d 

t" d" 


t <t 

t <C 


k 3 

k“ g* 

? 

Nasal 

m 



n 

- 

>1 

■ P 

0 


Trill 




r 





Flap 





i 

l* 




Fricative 


V 


s 




h 

Approx 

w 

0 




j 



Lat. Approx. 




i 

l 




Affricate 



ts dz 



m 

^ $ 




Table 1: Consonant phonemes in the Austro-Asiatic languages of 

India. 


Of the ten Austro-Asiatic languages included in the present 
study, one, namely, Khasi, belongs to the Austric group; the 
others belong to the Munda group. 

2.2. Vowels 

The vowel phonemes and the languages in which they occur 
are listed below. In the list, S stands for short vowels, L for 
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long vowels, N for short and long nasal vowels, and D for 
diphthongs. 


Vowel Patterns 

S-L-N-D 

Languages 

Vowel phonemes 

1. 5-0-0-0 

Juang, Mundari, Santali 

S\ i e u o/o a 

2. 6-0-0-0 

Ho 

S: i e u o o a 

3. 5-5-0-0 

Kharia 

S: i e u o i a 

L: i: e: u: o: a:/a: 

4. 6-5-00 

Khasi 

S: i e u o i a 

L: i: e: u: o: a: 

5. 6-0-(6)-3 

Didayi 

S: i e u o i a 

N: (lectio o) 

D : ai au oi 

6, 5-5-1-0 

Korku 

ftieuoB 

L: i: e: u: o: a: 

N: a 

7. 6-6-3-0 

Bhumij 

5; i e u o i a 

L: i: e: u: o: o: a: 

JV:e o a 

8. 9-4-6-0 

Bonda 

5: i i e u u o o a 0 

L: i: e: u: a: 

N: i e u 6 o a 


Table 2: Vowel phoneme patterns in the Austro-Asiatic languages of 

India 
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As the list shows, there is considerable variation in the 
vowel patterns of the Austro-Asiatic languages. This is in 
contrast to the languages of the Dravidian group, which show 
consistency, with most of them having equal numbers of long 
and short vowels. The core vowels of the Austro-Asiatic 
languages are the five short vowels /i e u o a/. 4 of the 10 
languages do not have long vowels; and only 4 have nasal 
vowels. Only Didayi has diphthongs, but no long vowels. 

3. Consonant and Vowel Allophones 
3.1. Consonant allophones 

Studies of the Austro-Asiatic languages of India are usually 
found to be short on allophonic details. The allophones listed 
below are not claimed to be exhaustive, but fairly 
comprehensive, based on the existing studies. A close 
consideration reveals the following features: The context in 
which allophonic variation takes place is predominantly 
intervocalic or syllable-final (i.e. Coda). What is interesting, 
however, is that the languages evince different concatenative 
features in identical environments. Thus Korku and Bhumij 
contrast with regard to the plosives having lax and tense 
allophones respectively in the same context, as can be seen in 
Table 3. The processes listed therein are not being defined, but 
their names are being assumed to be self-explanatory. 


Consonant group 

Process 

Context 

Language 

Approximants 

Strengthening 

V_V 

Korku 

Nasal 

Place Assimilation 

[joJ < A)/_i 

Bhumij 

Plosives 

Tensing 

v__v 

Bhumij 


Laxing 

v_v 

Korku 


Glottabzation 

v__v 

Korku 



Everywhere 

Juang 


Preglottalization 

in Coda 

Mundari 



Features of Word 


305 


Postglottalization in Coda Mundari, 

Khari a 

_# Santhali 

Spirantization V V Kharia 

Table 3: Consoanantal allophonic processes in the Austro-Asiatic 

languages of India 

3.2. Vowel allophones 

Apart from nasalization, there is hardly any discussion of other 
allophonic features of vowels in these languages. 

4. Syllable Structures 

The languages have the following canonical syllable structures 


Sr. No. CSS 

1. (C)V(C) 

2. (C)(C)V(C) 

3. (C)(C)V(C)(C) 

4. (C)(C)(C)V(C) (C) 

5. (C)(Q(C)V(C)(C) 


Language 

Ho, Khar, Mun, Santali 
Didayi, Juang 
Bhumij, Korku 
Khasi 
Bonda 


(CSS): 

Table 4: Canonical syllable structures in the Austro-Asiatic 

languages of India 

Some of the languages have many exceptions to the full 
realization of the canonical syllable, especially in the onset and 
coda positions in the word. These restrictions are expressed in 
detail in the following section. 
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5- Distribution of Phonemes 

5.1. Consonant Phonemes 

The only general constraint on the occurrence of individual 
consonant phonemes in these languages is on retroflex 
consonants: retroflex consonants do not occur word-initially. 
Generalizations on the occurrence of consonants in clusters are 
available for each of these languages, as stated in a tabular 
form in the following subsections, 

5.1.1. Consonant clusters 

The distiibution of consonant clusters is presented below in 
teims of a two-fold division: one, into word-initial and word- 
final positions, and the other into 2-consonant and 3-consonant 
clusters. The data are presented in tables, using the following 
abbreviations. 

GEN-General, MS= Melodic .Sequence, C=Consonant, 0=Obstruent, 
S=Stop, P=Plosive, Pvl=Voiceless Plosive, Pvd=Voiced Plosive. PA= 
Aspirated Plosive, POUnaspirated Plosive, F=Fricative, Sn= Sonorant, 
A=Approximant, N-Nasal, G=G!ide, L=Liquid, LS=LaryngeaI Stop! 
G=Geminate (e.g., GN=Geminate Nasal), H=Homorganic (e.g., HS= 
Homorganic Stop), NX= Non-X (e.g. NN= Non-nasal) 

We have used the term melodic sequence to distinguish 
specific melodic sequences from general sequences being 
permitted in these languages; for example, as shown in Table 
4, whereas in Korku C+G sequences have a general 
distribution, in Bhumij, only specific melodic realizations- sw 
and pj - occur. 

5. 1.1a Word-initial 

The distribution of 2-consonant clusters in the word-initial 
position is presented below. The abbreviations are as 
elaborated above. 
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No. 

Cons. 

Classes 

Gen/ Ms 

Language 

1 . 

C+C 

GEN 

Didayi, Khasi 

2. 

C+G 

GEN 

Korku 



sw 

Bhumij 



Pj 

Bhumij 

3. 

C+L 

GEN 

NIL 



k/l/s/+r 

Bonda 



p/sr 

Bhumij 



S+r 

Juang 



k/g+1 

Bonda 

4. 

C+N 

GEN 

NIL 



sn 

Bonda 



kn 

Bonda 

5. 

N+C 

GEN 

ns, i[<X, rjk, mb 

NIL 

Bonda 


Table 4: Distribution of 2-consonant clusters in the word-initial 
position in the Austro-Asiatic languages of India. 

Constraints on the occurrence of 3-consonant clusters in the 
Onset position are found only in Bonda, which does not allow 
N+S+L sequences. 

5.1.1b Word-final 

Constraints on the occurrence of consonant clusters in the 
Coda position are given in the following table. These are on 2- 
consonant clusters only. 
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NO. CONS. 

GEN/ MEL. SEQ 

LANGUAGE 

CLASSES 



1. N+C 

GEN 

NIL 


nt 

Bhurnij 


t)k,nt, rid. 

Bond a 

2. HN+HS 

GEN 

NIL 


nd/t ng 

Korku 


r)k/g,rjp 

Bonda 


nt nk 

Kisan 

3. LS+P 

GEN 

NIL 


2p/t 

Bhurnij 


2p 


4. LS+N/L 

GEN 

NIL 



Bonda 

5. L+C 

GEN 

NIL 


rt 

Bonda 

6. G+C 

GEN 

NIL 


j+C 

Khasi 


jc 

Korku 


Table 5: Distribution of 2-consonant clusters in the word-final 
position in the Austro-Asiatic languages of India. 

3-consonant clusters do not occur in the word-final position in 
these languages. 

5.2. Vowel Phonemes 

There are extremely restricted constraints on the general 
occurrence of vowel phonemes in initial, medial, and final 
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positions in words. Specific languages may have restrictions 
on individual vowel phonemes. But no generalizations are 
available for distributional restrictions on vowels in these 
languages. 

In contrast to the occurrence of vowels in different 
positions in words in the Austro-Asiatic languages of India, a 
generalization is available for their occurrence in a sequence: 
All the (A-A) languages permit vowel sequences. Note that for 
languages such as Gadaba-Konekor (see Bhaskararao 1980), 
Tamil (see e.g. Vasanthakumari 1979), and Boro 
(Bhattacharya 1977), among others, a markedness constraint 
must be posited to account for the absence of vowel sequences 
in them. The constraint may be stated as follows: 

NOVOWELSEQUENCEAVORD: Vowels cannot occur 
adjacently within a word. 

Languages such as Gadaba-Konekor, Tamil and Boro respect it 
both within and across morphemes. Some languages, however, 
permit vowel sequences across morphemes, but not within 
them, such as Chokri (see Bielenberg and Nienu 2001) and 
Meitei (see Chelliah 2003). For languages such as Chokri and 
Meitei, the constraint must be stated as restricted to the 
morpheme: 

NOVOWELSEQUENCE/MORPH: Vowels cannot 

occur adjacently within a morpheme. 

The investigation of constraints on vowel sequences, it should 
be obvious, can provide phonological evidence for 
grammatical awareness. Note that the general markedness 
constraint NOVOWEL- SEQUENCE is the opposite of the 
markedness constraint NOCODA (see Smolensky and Prince 
1993). Languages that respect NOCODA must violate 
NOVOWELSEQUENCE. It is interesting to note that all the 
Indian A-A languages (A-N group excluded) violate 
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NOVOWELSEQUENCE, compared to many Austronesian 
languages observed in the literature. 

6. Conclusion 

An attempt has been made above to put together facts about 
the word phonology of the Austro-Asiatic languages of India. 
The general picture of the sound structure of these languages 
adds to earlier studies on some of the facts covered in either 
rather general studies of certain aspects of the word phonology 
of Indian languages, such as Ramanujan and Masica (1969), 
Ramaswamy (1983), Reddy (1987), and Pandey (2005), or of 
some specific aspect of the Austro-Asiatic languages, such as 
Donegan (1978), and Rao (1989). The paper presents a 
relatively comprehensive list of topics for the study of the 
general pioperties of the phonological structure of linguistic 
groups. It is hoped that such a framework is useful for carrying 
forward the work on comparative philological studies begun in 
Stampe (1963), Zide (1965), and Munda (1970). 
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ABSTRACT 

The relation between language and culture has been 
problematic and controversial. Scholars like Boas 
(1911), Herskovits (1969), and Hoijer (1948) accept 
language as an important aspect of culture whereas 
Kroeber (1967) and Levi-Strauss (1977) place it at 
par with the latter. Again, most of the studies dealing 
with the relation of language to culture (e.g. Boas 
1911, Goodenough 1957) are confined to the 
meanings of words only. For these scholars, 
vocabulary of a language is very significant for 
cultural study, and they do not consider grammar for 
the same purpose. This view is quite explicit in the 
following statements by Kroeber (1967:229): 
vocabulary is largely a cultural matter.' and 
'Grammar seems to be little influenced by culture 
status. 1 Such studies aim at finding out parallels 
between vocabulary items and cultural items. But 
there are other scholars, like Sapir (1949) and Whorf 
(1956) who have tried to relate grammar or 
grammatical categories to culture. The present paper, 
on the basis of Sora data, tries to examine if there 
really exists a parallelism between a people's culture 
and the grammar of their language, 

It is difficult for a modern linguist to confine himself 
to his traditional subject matter. Unless he is 
somewhat unimaginative, he cannot but share in 
some or all of the mutual interests which tie up 
linguistics with anthropology and culture history, 
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with sociology, with psychology, with philosophy, 
and, more remotely with physics and physiology. 

(Sapir 1949:68) 


1. Introduction 

The relation between language and culture has been 
problematic and controversial. Scholars like Boas (1911), 
Herskovits (1969), and Hoijer (1948) accept language as an 
important aspect of culture whereas Kroeber (1967) and Levi- 
Strauss (1977) place it at par with the latter. Again, most of 
the studies dealing with the relation of language to culture (e.g. 
Boas 1911, Goodenough 1957) are confined to the meanings of 
words only. For these scholars, vocabulary of a language is 
very significant for cultural study, and they do not consider 
grammar for the same purpose. This view is quite explicit in 
the following statements by Kroeber (1967:229): '... 

vocabulary is largely a cultural matter.' and 'Grammar seems to 
be little influenced by culture status.' Such studies aim at 
finding out parallels between vocabulary items and cultural 
items. But there are other scholars, like Sapir (1949) and 
Whorf (1956) who have tried to relate grammar or 
grammatical categories to culture. The present paper, on the 
basis of Sora data, tries to examine if there really exists a 
parallelism between a people's culture and the grammar of 
their language. 

2. Sora Kinship Terms 

Sora is an aboriginal language of India belonging to the Munda 
group of the Austroasiatic family of languages. According to 
Zide (1969), Sora belongs to the Koraput Munda sub-branch of 
the Southern Munda Branch of the Munda group whereas 
Battacharya (1975) groups it under the Intermediary Munda 
sub-branch of the Upper Munda branch of the same group. Its 
speakers reside mainly in the districts of Ganjam, Koraput and 
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Phulbani in Orissa and in Srikakulam in Andhra Pradesh. The 
Soras practise patrilineal descent and patrilocal 
marriage. With this information at hand, let us discuss the 
kinship terms of reference of the Sora language from a male 
ego's point of view. Sora has a descriptive kinship 
terminology as it makes a distinction between the lineal and 
the collateral relatives. It should be mentioned here that The 
same kinship terms are not always used in all Munda 
languages, not even in all languages of a single sub-group. 
Some terms do, of course, occur in all the major Munda 
languages, but the number of such terms is small.' 
(Bhattacharya 1970:445) In this paper, only the close 
consanguineal and affinal kin terms within two ascending and 
two descending generations of ego are taken into 
consideration. These terms have been collected from the 
Paniganda village of the Ganjam district. Let us now arrange 
them in the following manner for convenience: 

Second ascending generation (G+2): 


Father's father : juju 

Father's mother : yuyu 

Mother's father : jujurj 

Mother's mother : yuyurj 

First ascending generation (G+l): 

Father : buarj 

Father's elder brother : tatarj 

Father's elder brother's wife : yayarj 

Father's younger brother : dadi 

Father's younger brother's wife : yayarj 

Father's sister : awarj 

Father's sister's husband : mamarj 
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Mother : 

yao 

Mother's brother : 

mamar) 

Mother's brother's wife : 

awarj 

Mother's sister : 

awar) 

Mother's sister's husband : 

dadi 

Ego’s generation (GO): 

Elder brother : 

kakut] 

Elder brother's wife : 

kakir) 

Younger brother : 

uba 

Younger brother's wife : 

kaor) 

Elder sister : 

kakir) 

Elder sister's husband 

baur) 

Younger sister : 

ya?i 

Younger sister's husband : 

royam 

First descending generation (G-l): 

Son : 

o?on 

Daughter : 

o?on 

Brother's son : 

o?on 

Brother's daughter : 

o?on 

Sister's son : 

omossi 

Sister's daughter : 

omossi 

Second descending generation (G-2): 

Son's son : 

upleo 

Son's daughter : 

upleq 
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Daughter's son : uplep 

Daughter's daughter : upleo 


Since the Soras have a patriarchal family structure, the 
primacy of men in the family is very strong and wife is treated 
as an outsider in her husband's house. After marriage she 
does not join her husband's family but continues in the group 
of het birth till death. (Elwin 1955i53) So in such a male- 
dominated society it is natural to expect more distinctions in 
the male kin terms than in the female ones and the Sora 
language confirms it. 


The above data on Sora kinship show that there is no 
neutralization in G+2. It is probably because there are only 
four terms. But in G+l the same term lawarjl denotes father's 
sister and mother's sister (also mother's brother's wife) whereas 
there are different terms for father's brothers, i.e. Itatarjl and 
Idadi/ and mother's brother, i.e. Imamciyl. It is clear that the 
distinction between father's sister and mother's sister is 
neutralized whereas it is maintained in the case of father's 
brothers and mother's brother. Again, use of separate terms 
for father's elder brother and father's younger brother implies 
that age-distinction is made in the case of father's 
brothers. But no such distinction is found in the case of 
mother’s brothers, and only Imamcujl is used for both the elder 
and the younger brothers of mother. From all these, it is 
evident that neutralization with respect to age takes place in 
the case of mother's brothers whereas a distinction is 
maintained in the case of father's brothers. Then, though a 
distinction is made in the case of father's elder and younger 
brothers, it is lost in the case of their wives. The term lyayarjl 
denotes both father's elder brother's wife and father's younger 
brother's wife. On the side of the male kins, neutralization is 
found only between father's younger brother and mother's 
sister's husband as Idadi/ is used to denote both of them and 
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between mother’s brother land father’s sister's husband as 
Imamayl is used to denote both these relatives. 

In GO, neutralization is noticed only with respect to certain 
female kins: The distinction between elder brother's wife and 
elder sister is lost and fkakiij! denotes both of them. But no 
such neutralization is noticed in the case of male kins 
belonging to this generation. 

Neutralization of age difference and difference between 
consanguineal and affinal relatives have been noticed so far in 
G+l and GO. But down in the hierarchy of the descending 
generations, neutraliza- tion is much more frequent than the 
ascending ones. In G-l, /o?on/ is used for not only ego's son 
and daughter, but also for ego's elder brother's and younger 
brother's sons and daughters. On the other hand, a separate 
term hrnossi/ is used for ego's sister's son and daughter which 
indicates a dichotomy between ego's sisters on the one hand 
and ego and ego's brothers on the other hand. 

The data of G-2 show that there is only one term lupley/ 
which is used for sons and daughters of h ?on/ 'ego's or 
brother's offspring’ and /omossil 'sister's offspring'. It is quite 
important to note that when only the age distinction and 
consanguineal-affinal distinction get neutralized in G+l and 
GO, the sex-distinction is also neutralized in the descending 
generations in addition to the age and consanguineal-affinal 
distinctions. 

3. Observations on the Kin Terms 

From what has been discussed above, it is clear that the degree 
of neutralization with respect to age, consang- uineal-affinal 
distinction, and sex increases as we go down the hierarchy 
from the ascending to the descending generations in 
Sora. It should be stated here that it is in full agreement with 
the universal laid down by Greenberg (1966:76) that states'... 
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ascending generations are unmarked as against descending 
generations of equal genealogical distance from ego.' In this 
context, Boas' (1940:637) statement is worth mentioning: The 
frequent occurrence of similar phenomena in cultural areas 
that have no historical contact suggests that important results 
may be derived from their study, for it shows that the human 
mind develops everywhere according to the same laws. 1 
Keeping these in view, a pertinent question may be raised 
here: What is the reason for which terms of ascending 
generations are unmarked compared to those of descending 
generations? Greenberg has not looked into it and no other 
scholar seems to have written anything on this topic. So it is 
obligatory to give some thought to this question here. 

While discussing the centrality of the semiological 
perspective in the study of language, Saussure (1974:16-17) 
states: 'Linguistics is only a part of the general science of 
semiology, the laws discovered by semiology will be 
applicable to linguistics, and the matter will circumscribe a 
well-defined area within the mass of anthropological facts .... 
If we are to discover the true nature of language we must learn 
what it has in common with all other semiological systems....' 

The primary aim of kinship studies by anthropologists so 
far has been to discover the underlying structure of the society. 
But Saussure's view quoted above makes it clear that a 
correlation can possibly be established between the domain of 
kinship terms used in a language and the domain of its 
linguistic structures. Let us begin with a probable explanation 
for the unmarkedness of the ascending kin terms as against the 
descending ones. Each human being is well aware of his/her 
past experiences. So it is natural for him/her to know his/her 
ancestors intimately. But so far as the future is concerned 
he/she is obviously uncertain. In other words, he/she is not 
sure of anything in the future. So it is quite likely that he/she 
also does not have a clear idea about his/her successors as they 
are yet to come. Most probably for this reason the kin terms 
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belonging to the past and present generations are less marked 
than those belonging to the future generations. If it is accepted, 
then by linking it with language a hypothesis may be put 
forward that like the kin terms of the descending generations, 
the future tense should also be treated as marked compared to 
the other tenses, i.e. past and present. Sora fully supports this 
hypothesis. 

4. Sora Tense System 

A study of the Sora language reveals that there are only two 
tenses, i.e. present and past in this language and the former is 
employed to convey the future time. The following examples 
are illustrative: 

1. (a) nyeni anditinay 


“I play” 

'I play.' 

(b) nyeni andilinay 
“I played” 

'I played.' 

(c) nyeni anditinay 
“I will play” 

'I will play. 1 

(a) any in anditin 
“he plays” 

'He plays.' 

(b) anyin andilin 
“he played” 

'He played.' 

(c) anyin anditin 
“he will play” 

'He will play. 


It may be noticed here that the present tense forms 
I anditinay/ and landitinl are used to mark the future in 1(c) 
and 2(c) respectively, but there are separate forms, i.e. 
/andilinayl and landilinl that mark the past in 1(b) and 2(b) 
respectively. So by using Ultan's (1972) expression, Sora can 
be said to have possessed a 'prospective tense system'. 
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5. Observations on the Tense System 

On the basis of the above discussion, it can now be argued that 
most probably there exists a correlation between the marked 
character of the kin terms of descending generations and the 
absence of future tense in Sora. It must be recalled here that 
Greenberg (1966) categorically mentions that the descending 
generations are marked compared to the ascending ones. Now 
the problem is to see whether the future tense, like descending 
generations, is universally marked as against the present and 
future tenses. After a study of samples from 'approximately 
fifty languages' from the point of view of markedness features 
like relative boundness of forms, temporal gradation, 
obligatoriness of occurrence, and neutralization in different 
environments, Ultan (1972: 116) draws the conclusion that'... 
future tenses are generally more marked than either past or 
present tenses...' So a correlation can be established between 
Greenberg's and Ultan's observations. 

6. A Comparison 

Regarding the relationship between language and culture both 
are species-specific phenomena. Only human beings possess 
language and culture, not any other sub-human animals. 
Anthropologists have a good deal of agreement about the fact 
that language and culture are interdependent and they have 
grown up in correlation with each other. This leads Kroeber 
(1967: 225) to state: 'So far as the process of their 
transmission is concerned, and the type of mechanism of their 
development, it is clear that language and culture influences 
the other.” Levi-Strauss (1977: 71) also advances a similar 
view: ’... both language and culture are the products of 
activities which are basically similar. If we try to formulate our 
problem in purely theoretical terms, then it seems to me we are 
entitled to affirm that there should be some kind of relationship 
between language and culture, because language has taken 
thousands of years to develop, and both processes have been 
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taking place side by side in same minds.' Therefore, instead of 
siding with either language or culture to be more important 
than the other, it seems reasonable to accept them as two 
manifestations of the same minds and thus, to put them at par 
with each other. 

7. Conclusion 

To sum up, the main points discussed in this paper are as 
follows: On the basis of Sora data it is argued that regarding 
the relationship between language culture, Sapir’s (1949) and 
Whorfs (1956) views are acceptable, nor Kroeber’s (1967). In 
fact, the grammars of culture and language seem to be closely 
interwoven with each other and it is evidenced by the fact that 
both the descending kin terms and the future tense are marked 
in most languages of the world. So it is reasonable to argue 
that both language and culture have emerged from a common 
source and that they are the two manifestations of the same 
human mind. 
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ABSTRACT 

This study concentrates on only one aspect, i.e. 
negative markers. The following markers are found 
in these languages: Korku: baki, heba, ban, dun/duka, 
athika, baw, and baijgon ; Mundari: ka, alo, bano? In¬ 
animate]; baij [+animate, lsg, -3sg.]; and batjgai? 
|+animate, +sg, +3sg.]; and Santali: Ibaijl, /alo/, and / 
oho /. 

All the three languages have bang as the common 
negative. Korku has a whole lot of them not 
comparable to others. Interestingly Mundarai and 
Santali have negative forms which are more closer to 
each other than to Korku forms. In M & S they also 
have another form alo in common. The only 
difference between them is ka of M and oho of S. a ho 
of S is suspiciously close to alo. Then the question 
arises as to how Korku obtained so many negative 
forms. This question needs to be answered through 
more work on that language. This paper looks into 
these aspects briefly. 

Introduction 

The present study tries to identify and compare the negative 
markers in the three related languages, namely, Korku, 
Mundari and Santali of Munda sub-branch of Austroasiatic 
family. As they belong to one sub-branch (i.e. North Munda), 
the languages concerned show close similarities between them. 
However among these three, Mundari, spoken mainly in 
Ranchi, Hazaribagh, Palamu districts of Jharkhand, by more 
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than a million, and Santali, spoken mainly in Jharkhand, parts 
of Orissa and West Bengal, by more than six million, are more 
close to each other, not only linguistically but also 
geographically. They are normally included in a sub-branch 
called Kherwarian, based on a native tradition. The other 
language, Korku, is the western-most language of this group, 
separated from the other two by more than a couple of 
hundreds of kilometers, spoken in the Satpuda and Mahadev 
hills, across the borders of Maharashtra and Madhya Pradesh, 
by more than half a million people. Because of the 
geographical separation, it shows some differences with the 
other two languages. Even then, interestingly, Korku is more 
close to Mundari than to Santali. 

1. Korku 

Korku language has many negative forms. They are the 
following: baki, heba, ban, dun/ duka, athika, baw, and 
barjgon. Among these ban, baw, baki, and baygon have 
same/similar first syllable CV; and the rest, namely, -n, -w, -ki, 
-ygon might mark certain other distinctions. ba-ki is a 
negative imperative; while ban and baw mark simple 
negation. While ban is used in non-past constructions, baw is 
used to negate propositions. Baygon is used in negative 
answers. Except for the negative forms baw/ duka and athika, 
others can occur either preceding the verb stem or 
following it. The exact position does not make much 
difference in their meaning. While baw, duka occur only 
after the verb stem; the marker athika occurs before the 
verb stem, and ban, heba can occur in either positions. 
When negative markers are used in sentences, tense markers 
are not used. 

(i). heba: It is used in non-past (habitual) tense. Also, by its 
use, it marks that one does not have the particular 
ability/capacity to do/perform an act. As well as, it marks a 




326 


Nagaraja K.S. 


negation of wish or interest or liking. So when ability is 
negated, the subject will take the instrumental case marker.: 

(a) inj heba heje 

I neg come 

'I do not come'. 

inj heba dodon 

1 neg lift 

'I do not lift’. 

die heba harey 
he neg know 
‘he does not know 1 . 

(b) in-a-ten heba tullu 
I-pos-Instr neg lift 

'I cannot lift 1 . 

dij-a-ten gada heba par(a)mu 
he-pos-instr river neg cross 
'he cannot cross the river'. 

inen ini-kitab heba hona 
I-dat this-book neg need 
'I don't need this book'. 

When heba is used, past aspect marker dun can be used. It 
occurs after the verb root.: 

inj heba sadun 
I neg bring-pt 
'I did not bring'. 

die heba gitijdun 
he neg sleep-pt 
'he did not sleep'. 

heba and ban convey same sense and also are 
interchangeable.: 
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heba/ban harey 
neg know 
'(I) don't know 1 . 

tenj inj ga:wen heba/ban sene 
today I village-to neg go 
'I do/will not go to the village today 1 . 

(ii) (a), ban : 

inj ban heje 
I neg come 
'I do not come'. 

inj kusumusu ma:ndi ban 
I adv. speak neg 
'I don’t speak whisperingly'. 

die sene ban 'he does not/ will not go'. 

inj ban ' I not' = 'I won't' (do some thing, etc.) 

(b).This form is used to negate possessive constructions: 

dayten konku-pucuku ban- da:n 
brother-to children neg-have (pt) 

'The brother did not have any children'. 

dijen thayka soya ban 
she-to proper support neg 
'She does not have proper support'. 

(iii) . dun/duka: It occurs after the verb and not preceding the 
verb. It conveys negation in past tense.: 

inj den sen- dun/duka vs. inj den ban/heba sene 
I there go- neg I there neg go 

'I did not go there'. 'I do not go there'. 

die a:y hej-d un/duka 
he yet come-neg 
'he hasn't come yet'. 

[*d uka < dim-ka 'neg. marker' + 'definite marker'] 
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a:m dikama:y co: do.-dun/du(n)-ka 
you that-work Q do-neg-def 
'why did you not do that work?' 

inj koldin ga:wen sen-dwn-ka 
I yesterday village-to go-neg-def 
'I did not go to the village yesterday 1 . 

(iv) . cithika : It is used mainly in conversations to mark 
negation. The negative marker conveys, besides pure negation, 
a sense of delayedness.: 

die a:y cithika heje 
he part, neg come 
'he hasn't come yet'. 

inen dunum athika heje 
me-to sleep neg-yet come 
'I am not getting sleep yet'. 

a:m co:ja athika heje 
you why neg-yet come 
'why are you not coming?' 

(v) . baw: It negates propositions. It occurs in verb-less 
constructions, after the object; and used to negate 
equational constructions as well.: 

inijen ba:te baw (*heba, *ban) 

this-anim-to father neg 

'this (boy) has no father'. 

ina meran dama baw do oro-bi baw 
I-pos near money neg and grain-also neg 
'I do not have money and also grain'. 

In the last one above, in place of baw , heba can be used; 
though not preferred. 

beta, die gada baw/hebu 
son, that donkey not 
'son, that is not a donkey’. 
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Negation of location is marked by the use of either baw/hebu ( 
< hebci). They can function like verbs, so are capable of taking 
person markers. 

ra:ja ennen baw-ntc/hebu-n&c 
king here-at neg-anim 
'king is not here'. 

ra:ja ennen baw- nec- da:n// 2 <?frn-nec-da:n 
king here-at neg-be-pt 
king was not here'. 

Here negative markers have taken agreement markers 
(animate), similar to the verbs. 

(vi). barjgon : It can take tense markers; so functions more like 
a negative verb. (In the texts only use of past tense has been 
observed.). 

(Ans) ale baijgon 'we will not' (come)’, 

ale batjgon-mn 'we did not' (go) 

die a:ta jojom barjgon-ntn 

he food eat neg-pt 'he did not eat food'. 

In the above context though ban can be used, others baw and 
athika cannot be used. 

2 , Mundari 

The negative markers in Mundari are Ikal and / alo /. /kal is 
highly productive for lexical and sentence negation in 
indicative sentences. It is a morphologically bound form. / alo / 
is used for the negation of imperative or optative sentences. 

/ka/ functions as the morphologically negative particle at 
the lexical level. It also functions as the sentence negation 
marker and is fixed to the preverbal position. The negative 
marker Ikal is put in the preverbal position for the sentence 
negation followed by the personal suffix which is a subject 
agreement element. 
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Ranchi-te ka-n sen-ke-na-a 
Ranchi-to Neg-lsg go-AM-ITM-prd 
‘I didn’t go to Ranchi’. 


The Negative marker !ka/ is usually situated in preverbal 
position. The negative marker /kaj can be posited in sentence 
final position for the affirmation. 


Indicative: 


ka-el jom-ke-d-ko-a 
Neg-sub eat-AM-TM-obj-Prd 
‘He/she didn’t eat them (animate)’. 

Imperative: 

alo -m jom-le-a 
Neg-sub eat-AM-prd 
‘Don’t eat it first’. 

Optative: 

alo- ka-e? jom-le-ko-a 

Ng-opt-sub eat-AM-obj-prd 

‘He/she would not be allowed to eat them first’. 

In the imperative sentence lalol functions as the prohibitive 
marker. In the optative sentence lalol means the negation of 
hope and desire. 

In sentence with copular verbs special forms are used to 
mark negation. They are / bano?! for inanimate, /baijgai?/ for 
first person singular and third person singular, and Ibaijl for 
animate other than first person singular and third person 
singular. 

Negation in copula sentences is slightly more complicated. The 
negative of bnenci ?/has three variants. For example, 
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soma ora?-re baijga? i-i-a 

soma house-in Cpl.Neg-3sg-prd 

‘Soma is not in the house’. 

parkom ora?-re banol-a 

bed house-in Cpl Neg-Prd 

‘A bedstead is not in the house’. 

hon-ko ora?-re bay- ko-a 

child-pl house-in Cpl Neg-3pl-Prd 

‘Children are not in the house’. 

Mundari has generalized it in the following way: 

bano? [-animate] 

ban [+animate, Isg, -3sg.] 

barjgai? [-(-animate, +sg, +3sg.] 

The negative of /tan/ is formed by just adding to the negative 
marker /ka/ before /tan/ as in a regular verb. 

soma ka tan-i? 

Soma Neg Cpl-3sg 

‘It is not Soma’. 

3. Santali 

Negative particles : There are three negative particles: /bay/ as 
ordinary negative, falol as prohibitive negative, and /oho/ as 
emphatic negative. All of them occur in the pre-verb position 
and in all the three cases the suffixed forms of the personal 
pronouns are added to them instead of the verb to mark the 
animate subject. The ordinary negative Ibarjl drops its final 
consonant when the suffixed forms of the subject pronouns are 
added to it though exceptions are not rare. 

/bar}/'. It is used in ordinary negative sentences and also in 
conditional clauses. 

ba-e em-ad-in-a 

Neg-3sg-subj give-pst-lsg:obj 
‘he did not give me’. 
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uni ba-e men-ked-a 

he Neg-3sg.subj say-pst 
‘he did not say’. 

in ba-n baday-a 

I-subj Neg- know-M 

‘I do not know’, 
bap ‘no’. 

!alof\ It indicates prohibition and occurs in the simple 
present/future verb forms with second person and verb forms 
with lcilo 1 performs the function of the negative imperative. 

hande alo-m cal-ak’-a 

there Neg-2sg:subj go-M-fin 

‘you do not go there’. 

alo-m roR-a 

Neg-2sg:subj speak-M-fin 

‘you do not speak’. 

alo-m em-a-e-a 

Neg-2sg:subj give-3sg:obj 
‘you do not give him’. 

lohol. It is an emphatic negative used with the subjunctive verb 
forms and with another aspect in the apodosis part of a 
conditional sentence. 

oho -n cal-ak’-a 

Neg-lsg:subj go-M 
‘I do not/shall not go’. 

in do oho- n loi-ke-a 

I-subj Foe Neg-lsg:subj tell-opt 
‘I might not say’. 

ih do nui gidro oho-h goc’-dare-ke-a 

I-subj Foe this boy Neg-lsg:subj kill-opt 

‘I am unable to kill this boy’. 
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4. Conclusion 

It is interesting to observe that from the various negative 
markers these languages have, viz. 

Korku: baki, heba, ban, dun/ duka, athika, haw, and baijgon; 

Mundari: ka, alo, bano? [-animate]; bay [+animate, lsg, -3sg.]; 
and baygai? [+animate, -rsg, +3sg.]; and 

Santali: /bay /, lalol, and /oho! 

all the three languages have bay as a common negative. Korku 
has a whole lot of them not comparable to others. In Mundari 
and Santali they also have alo in common. The only difference 
between them is ka of Mundari and oho of Santali. oho of 
Santali is suspiciously close to alo. Probably more work may 
reveal its status better. 

Then, the question arises as to how Korku obtained so 
many negative forms. This question needs to be answered 
through more work on that language. 
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ABSTRACT 

Grierson, G.A. (1908: Vol. Ill, Part I: 427) pointed 
out that many Tibeto-Burman (TB here after) 
languages have developed a system of 
pronominalization and some other linguistic features 
that can be ascribed to the influence of Munda 
languages once supposed to have been spoken in 
western Himalayas. He based his remarks on the list 
of words and phrases prepared by Revd. J. Bruske for 
the Linguistic Survey of India. Later on it came to be 
known as Munda Hypothesis. Sten Konow in 
Grierson’s Survey also based his classification of 
Tibeto-Burman languages on the feature of 
pronominalization, where he used the terms— complex 
pronominalized languages - western sub-group and 
eastern sub-group. 

Bauman, J. J. (1975) examined this Hypothesis 
taking into account the data from a number of TB 
languages and arrived at the conclusion that 
Pronominalization and agreement system in so many 
TB languages can be reconstructed at a Proto *TB 
stage rather than the influence of Munda languages. 
Buaman J.J. (1979:419) again argued that, 
"Languages which possess pronominal agreement 
systems may perhaps be regarded as unusual given the 
prevailing opinion on Tibeto-Burman as highly 
analytic, that a significant membership of the family 
does exhibit such patterns and that the phenomenon is 
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almost certainly reconstructible to Proto-Tibeto- 
Burman (*PTB). Bauman argued that what Grierson 
called pronominalization as a structural borrowing 
from outside Tibeto-Burman, particularly from 
Munda, fails on several grounds. Buaman (1979) 
further examined the case marking, pronominal 
agreement morphology and concluded that system 
needs to be reconstructed for *PTB rather than mere 
interfamily influence. 

The present paper will discuss the ongoing debate 
and provide fresh outlook into the system of 
pronominalization from Manchad, Bunan, Rongpo 
and Byangsi in some detail. 

Introduction 

Sten Konow’s classification of Tibeto-Burman family in 
Grierson’s Linguistic Survey of India identified Tibeto 
Himalayan branch of Tibeto-Chinese family. The Tibeto- 
Himalayan branch was further sub-divided into three groups, 
viz. Tibetan group, complex pronominalized Himalayan group 
and non-pronominalized Himalayan group. Complex 
Pronominalized Himalayan group was further divided into 
Western sub-group and Eastern Sub-group. Robert Shafer 
(1955) in his classification of the Sino-Tibetan family praised 
Sten Konow’s classification of Himalayan languages but 
criticized his classification of Kuki-Chin and Naga languages. 
First of all, Shafer named the family as Sino-Tibetan and 
presented a new scheme of classification. Since the present 
paper is not a review of the classification we are not providing 
any details on classification. 

Western sub-group of Complex Pronominalized 
Himalayan group includes the languages of Indian parts of 
western Himalayas. These are- Manchad/Manchati/ 
Patani/Pattani, Chamba Lahauli, Tinan/ Gondhala/ Rangloi, 
Bunan/Punan/ Gahari, spoken in Lahaul valley in Lahaul-Spiti 




336 


Siihnu Ram Sharma 


district, Kanashi, in Malana village in Kullu district, 
Kinnauri/Kanawari in Kinnaur district. All these districts are in 
the State of Himachal Pradesh. Byangsi, Chaudangsi, 
Darmiya/Darma, Raji; and now an extinct language 
Rankas/Saukia Khun; are spoken in Pithauragarh district, and 
Rangpo is spoken in Chamoli district of Uttarakhand State. 
Some villages of Byangsi and Raji are also found in Darchula 
sub-division across the Indian border in Nepal. 

Eastern sub-group of Complex Pronominalized Himalayan 
group includes languages, mainly spoken in Nepal and some 
small groups of people are found in various North-Eastern 
States of India. These languages are- Dhimal, Thami, Thangmi, 
Limbu, Khambu, Bahing, Balali, Sangpang, Lohorang, 
Lambichong, Waling, Chingtang, Runchengbung, Dungmali, 
Rodong or Chamling, Nachereng, Kulung, Thrilling’ 
Chaurashia, Khaling, Dumi,Yakha, Rai or Jimdar, Vayu or 
Hayu, Chepang, Kusunda, Bhramu, Thaksya- unspecified, and 
several others are spoken in Nepal and in parts of Darjeeling, 
Sikkim and Jalpaiguri, in West Bengal. 

Grierson, G.A. (1908: Vol. Ill, Part I: 427) pointed out that 
many Tibeto-Burman languages have developed a system of 
pronominalization and some other linguistic features that can 
be ascribed to the influence of Munda languages once 
supposed to have been spoken in western Himalayas. He based 
his lemarks on the list of words and phrases prepared by Revd. 
J. Bruske for the Linguistic Survey of India. Sten Konow 
remarked that “Mr. Bruske’s list makes it, so far as I can 
ascertain the old language’s influence of which can still be 
traced in the Kanawari dialect, must have belonged to Munda 
language family. Later on it came to be known as ‘Munda 
Hypothesis . It was based on the linguistic features of 
pronominalization, existence of checked consonants, (p’ t’ c’ 
k’), counting of higher numbers by twenty and some type of 
reduplication. It was emphasized in the survey that these 
features are also found in Munda languages, like Santhali. 
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Further more for languages of eastern sub-group, it was 
pointed out in LSI Vol. Ill, (Part 1:273) that there is a tendency 
to distinguish the person of the subject by means of 
pronominal affixes in all the languages, i.e. Dhimal, Thami, 
Limbu, Yakha, Khambu, Bahing, etc. The regular place of this 
suffix is between the base and the auxiliary, e.g., in Thami: 

/hok-ijga -du/ 

being I-am ‘lam,’ 

Similar distinction of person of the subject by means of 
pronominal suffixes is in agreement with the practice of 
Munda languages, e.g.: 

Santali: /rdgdch-ed-in tahdkana/ 

hungering-I-was 
‘I was hungering.’ 

Limbu: /khene ke-wd/ 

Thou thou-art ‘Thou art.’ 

Higher numbers are counted by 20s in Dhimal, Yakha and 
Khambu. But Manchad, Byangsi, Chaudangsi and Darma have 
a mixed system of counting higher numeral, i.e., both decimal 
and vigesimal systems are used. Reduplication, strictly 
speaking echo-formation is a common feature of many south 
Asian languages and the entire Himalayan languages share that 
feature. But whether this feature is inter-family influence or 
indigenous to some of the Himalayan languages is still 
debatable. Surely ‘Linguistic area’, hypothesis can be extended 
to include some of these TB languages of the western 
Himalayas that were not included in Emeneau’s hypothesis. 
The existence of checked consonants (p’ t’ c’ k’) in some of 
the western Himalayan languages is another feature ascribed to 
Munda influence. But such a feature is found in many 
languages independent of contact. Many TB languages have 
such consonants in syllable final position. Therefore, it is 
difficult to conclude that such a feature is solely, a Munda 
feature. 




338 


Suhmi Ram Sharnia 


Bauman, J. J. (1975) examined Bruske’s Hypothesis taking 
into account of the data from a number of TB and Munda 
languages and arrived at the conclusion that Pronominalization 
and agreement system in so many TB languages can be 
reconstructed to the Proto *TB stage rather than the influences 
of Munda languages. Bauman (1979:419) again argued that, 
Languages which possess pronominal agreement systems may 
peihaps be regarded as unusual given the prevailing opinion on 
Tibeto-Burman as highly analytic, that a significant 
membership of the family does exhibit such patterns and that 
the phenomenon is almost certainly reconstructible to Proto- 
Tibeto-Burman (*PTB)”. Bauman argued that what Grierson 
called pronominalization as a structural borrowing from 
outside Tibeto-Burman, particularly from Munda, fails on 
several grounds. Bauman (1979) further examined the case 
marking, pronominal agreement morphology and concluded 
that a system needs to be reconstructed for *PTB rather than 
mere interfamily influence. 

Sharma, S.R. (1996) examined his own recent field data 
fiom Manchad, Bunan and Bynagsi on pronominalization and 
agreement and concluded that pronominalization in various 
western Himalayan languages cannot be treated under the 
blanket terms - like agreement system or pronominalization. 
Each language follows its own system of agreement and 
pronominalization and needs to be examined at various levels 
of analysis. 

Language Contact Situation 

If we go back in time it is possible to speculate that Sanskrit, 
Pali and Prakrit languages must have in contact with old 
Tibetan and its dialects. Indian contacts with Tibet go back to 
6 th century when Buddhism reached Tibet and China. Later on 
we have central and western Pahari Indo-Aryan (IA hereafter) 
languages that are in contact with TB languages in western 
Himalayas. We are also aware that before the Tibetan empire 
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in western Tibet there was a Zhangzhung language spoken in 
western Tibet. It is important to note here that some traces of 
Zhangzhung are found in Manchad, Kinnauri, Byangsi, Bunan 
and in other pronominalized languages in western Himalayas. 
But strangely enough Tibetan and its dialects do not have any 
traces of Zhangzhung. If we look at the present state of 
languages in western Himalayas a large number of IA loans are 
found in all these languages. Again we find that the number of 
IA loans into pronominalized languages is higher than in 
Tibetan and its dialects. This shows that pronominalized 
languages must have had longer contact with IA than Tibetan 
and its dialects. Or may be Tibetan dialects were more 
conservative in borrowings but pronominalized languages 
were more open due to sociolinguistic reasons. One major 
reason that I can advance is the acceptance of caste system and 
some other local customs and beliefs of Indo-Aryan by the 
speakers of Manchad, Kinnauri and others but speakers of the 
Tibetan dialects like Spiti and Jad etc. did not adopt the 
practices followed by neighbouring IA people. Therefore, long 
term social intercourse must have led the TB speakers in this 
area to borrow lexical items and in due course developed some 
new structural features as well. 

Indo-Aryan languages that are in contact with various TB 
languages are Nepali, Kumauni, Garhwali, Chambyali, Kulvi 
and other Pahari dialects. More recently with the spread of 
education and mass media, Hindi and English are being used in 
some walks of life. Most of the schools use Hindi as the 
medium of instruction. 

First of all, if we accept the Munda influence theory in 
western Himalayas then the contact must have been before the 
IA contact. In historical linguistics we know that structural 
borrowing results only when there is a long and sustained 
contact between languages. We also need to ask a question that 
why only some of the languages in this area were influenced? 
All the Tibetan dialects in western Himalayas do not have 
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either the feature of pronominalization or other type of 
agreement system. The languages like, Ladakhi, Tod, Khoksar, 
Patnam Bhoti, Spiti and Jad found in this area do not share this 
feature. Therefore, the contact between IA and TB languages 
in the western Himalayas provides us a partial answer to the 
question of Pronominalization being simple contact 
phenomena. Moreover, there are no authentic historical 
records to support the view that Munda people once lived in 
western Himalayas. 

Pronominalization 

Now I shall show the systems of pronominalization and 
agreement of four languages that I have studied in detail, viz., 
Manchad, Bunan, Byangsi, and Rongpo. Although all these 
languages have the system of subject pronoun incorporation in 
the verbal complex, yet each one of them follows a distinctive 
system of its own. All these can not be placed under the same 
term called pronominalization. 

Manchad verbal paradigms incorporate the first and the 
second person pronominal element of their subject pronouns. 
The subject of the transitive verb is marked with ergative case 
marking in all tenses. However, the ergative marking is not 
overtly marked in first person singular pronoun. Examples 
below show the system of pronominal marking in the verbs. 


1. 

gye-0 

ti 

tur)m-a:ta-g 


I-ERG 

water 

drink-PR l.s 


‘I drink water.’ 



2. 

ne-ku-i 

ti 

turjm-a:ta-si 


we-d-ERG 

water 

drink-PR-Id 


‘we (d) drink water.’ 


3. 

ne-tsi 

ti 

tur)m-a:ta-ni 


we-pERG 

water 

drink-PR-Ip 


‘we (p) drink water.’ 
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4. 

ke-i 

ti 

tur)m-a:ta-n 


you s- ERG 

water 

drink-PR-2s 


‘you (s) drink water.’ 


5. 

ke-ku-i 

ti 

tur)m-a:ta-si 


you-d-ERG 

water 

drink-PR-2d 


‘you (d) drink water,’ 


6. 

ke-tsi 

ti 

tur)m-a:ta-ni 


You-pl-ERG 

water 

drink-PR-2p 


‘you (p) drink water.’ 


7. 

do-iti 


turjm-a: 


s/he-ERG 

water 

drink-3PR 


‘s/he drinks 

water. ’ 


8. 

do-ku-i 

ti 

tur)m-a:to-ku 


s/he-d-ERG 

water 

drink-PR-d 


‘they (d) drink water,’ 


9. 

do-tsi 

ti 

turjm-a:to-re 


they-pERG 

water 

drink-PR-p 


‘they (p) drink water.’ 



From the above examples we can say that in first person 
singular verb form above in (1.) the pronominal element is 
marked by /-g/, first person dual is marked by /-si/ in (2.), and 
first person plural is marked by /-Hi/, in (3.), second person 
singular is marked by /-nJ in (4.), and second person dual and 
plural elements are identical with first person dual and plural 
respectively. The third person singular is unmarked for person 
and the verb forms receive category of number only, i.e. dual 
and plural suffixes, /-ku/ md /-re/ respectively as in (8 and 9.). 
Third person in xManchad and other Himalayan languages is 
demonstrative only. The demonstrative is used to indicate both 
third person and a thing. The subject is marked with ergative 
case /-i/ or -tsi/ in transitive verbs. But there is no marking of 
the object in any of the languages in the western Himalayas. A 
verb in Manchad always has seven distinct forms with regard 
to number and person and transitive subjects are marked with 
ergative case. 
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Bunan follows yet another system of agreement and 
pronominal marking. The basic distinction is found between 
first person and non-first person singular forms; therefore, 
Bunan has four verb forms, viz., first person singular and non- 
first singular. Then similar distinction is found in first person 
plural and non-first plural forms. In Bunan there is no dual 
number and ergative marking in the subject. With regard to 
person and number a basic distinction is first and non-first 
person both in singular and plural. Person number elements are 
fused into a complex form, /-ye/ first person singular, and f-ni/ 
second and third person singular form. First person plural is 
represented by /-khe/ and non-first plural is marked by /-kha/ 
First of all the person-number distinction is fused into a single 
element and it is a distinction between first and non-first 
person apart from number system. The following examples 
illustrate the point.: 

Present tense or present habitual forms of verb /turjg-/ ‘to 
drink’: 


l.s /tugg-ye/ 

2,3.s /tugg-ni/ 

1. p /tug-khe/ 

2, 3.p /tug-kha/ 


‘I drink.’ 

‘you/s/he drink.’ 

‘we drink.’ 

‘you, s/he (p) drink.’ 


It is different from Manchad where first and second persons 
are marked in the verb along with fused number affixes. In 
third person only the number category is reflected, i.e, singular, 
dual and plural. All the three persons are distinct in Manchad, 
where as Bunan has a basic distinction between first and non- 
first persons. This means that the pronominalization as a 
phenomenon in Bunan is restricted to first person singular and 
plural only. But in case of second and third person singular and 
plural forms the use of subject pronoun is a must, otherwise the 
sentence is ambiguous. 83a 
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Byangsi, Chaudangsi and Darma have similar systems. I 
provide examples from Byangsi here. All these languages do 
not have dual number and the distinction of pronominal 
marking and agreement differs with regard to tense as well. 

Present tense forms of verb /dza-/ ‘to eat’: 


1. s 

/dze/ 

Teat.’ 

1. p 

/dza:gnye/ 

‘we eat.’ 

2. s 

/dza:gno/ 

‘you eat.’ 

2. p 

/dza:gni/ 

‘you (p) eat. 

3. s 

/dza:gon/ 

‘s/he eats’ 

3. p 

/dza:gnsn/ 

‘they eat’ 


One way to form past tense in Byangsi is to add a prefix /ka-/ 
to the verb root. But it is interesting that there is a single verb 
form in all persons and numbers- /ka-dza/ ‘ate’. Therefore, 
person number marking is not represented at all. But in present 
tense forms a distinction is found both with regard to number 
and person, i.e., first person singular and plural, second person 
singular and plural and third person singular and plural. This 
means that prnominalization is not as prominent as in Manchad. 

In Rongpo the basic distinction is just between first and 
non first in singular forms and plural form for all persons is 
identical. Therefore, the so called pronominalization is a highly 
reduced system. The verb form is a complex of person- 
number-tense fused together. It is difficult to say that a person 
category is reflected in a verb form. One has to use the subject 
pronouns in order to make a distinction between second and 
third person singular forms and first, second and third person 
plural forms. Examples are given below: 

Present tense forms of verb / tu/ ‘to drink’: 


1. s 

/tu:r)/ 

‘I drink.’ 

2. 3.s 

/tu:n/ 

‘you drink, s/he drinks.’ 

1. 2.3.p 

/tu:ni/ 

‘we, you (p) they drink’ 
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From the above discussion we can see that some of the 
western Himalayan languages, like Manchad and Byangsi etc. 
do have a category of person fused in verb forms but the 
languages like Rongpo do not have any clear distinction in a 
verb form with regard to person category. Each language 
follows its own system and to put all these languages under 
one cover term ‘pronominalization’ can be misleading. 

The pronominazation in the languages of the Eastern sub¬ 
group as described by Bauman (1979), concludes that the 
pronominalization in so many TB languages can not be simply 
treated as an influence from outside the TB family and can be 
established at *PTB stage. Moreover, the Munda languages 
like Santali have a different system of marking subject and 
object pronouns in the verb. Another question is pertinent at 
this stage as to how come only some languages in the area 
were influenced and not others? The Tibetan dialects like Spiti, 
Jad, Tod Bhoti and a few others are not influenced by this 
phenomenon of pronominalization or agreement system. 

Lexical borrowings from Indo-Aryan sources are found in 
all the pronominalized group of languages. But the largest 
number of such borrowing are found in Manchad, followed by 
Kinnauri, Rongpo, Raji, Byangsi and Chaudangsi. Tibetan 
dialects listed above also lack borrowings from Indo-Aryan. 
But surely final conclusions can be drawn only when a large 
amount of data from other Munda and TB languages is 
available. Other structural features like checked consonants, 
reduplication, counting by twenties and many other 
morphological and syntactic patterns also need a detailed 
study before arriving at the conclusion on Munda hypothesis in 


western Himalayan languages. 


Symbols and Abbreviations: 


1 . 

First person 2. 

Second person 

3. 

Third person /:/ 

vowel length 

d 

dual ERG 

ergative case 

P 

plural *PTB 

proto Tibeto-Bunnan 

PR 

present tense s 

singular 
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